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NOVEL MAMMALIAN G-PROTEIN COUPLED RECEPTORS HAVING 
EXTRACELLULAR LEUCINE RICH REPEAT REGIONS 

Introduction 

5 Field of the Invention 

The field of this invention is the G-protein coupled receptor family of proteins. 
Background 

Gonadotropins (Luteinizing hormone, LH; follicle stimulating hormone. FSH; 
chorionic gonadotropin. CG) and thyrotropin (TSH)) are essential for the growth and 
10 differentiation of gonads and thyroid gland, respectively. These glycoprotein hormones 
bind specific target cell receptors on the plasma membrane to activate: die cAMP-proiein 
kinase A pathway. 

The receptors for LH. FSH and TSH belong to the large G-protein-coupled, seven- 
trans-membrane protein family but are unique in having a large N-terminal extra-cellular 

15 (ecto-) domain containing leucine-rich repeats important for interaction with large 

glycoprotein ligands. Studies suggest that in these receptors, the extra-cellular leucine rich 
repeat region serves as a "baseball glove" which efficiently catches its corresponding large 
hormone ligand and optimally orients it for interaction with the seven trans-membrane- 
helical domain of the receptor. 

20 Because hormones and receptors play a prominent role in a variety of 

physiological processes, there is continued interest in the identification of novel receptors 
and their ligands. as well as the genes encoding the same. 
Relevant Literature 

References of interest include: el Tayar, N. "Advances in the Molecular 

25 Understanding of Gonadotropins-Receptors Interactions," Mol. Cell. Endocrinol. 

( December 20. 1996) 125: 65-70; Bhowmick et aL, "Determination of Residues Important 
in Hormone Binding to the Extracellular Domain of the Luteinizing Hormone/Chorionic 
Gonadotropin Receptor by Site-Directed Mutagenesis and Modeling," Mol. Endocrinol. 
(September 1996) 10: 1 147-1 159; Thomas et ah, "Mutational Analyses of the 

30 Extracellular Domain of the Full-Length Lutropin/Choriogonadotropin Receptor Suggest 
Leucine-Rich Repeats 1-6 are Involved in Hormone Binding," Mol. Endocrinol. ( June 
1996) 10:760-768; Segaloff & Ascoli. "The Gonadotropin Receptors: Insights from the 
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Cloning of their cDNAs/' Oxf. Rev. Reprod. Biol. ( 1992 ) 14: 141-168; Braun etaL. 
"Amino-Terminal Leucine-Rich Repeats in Gonadotropin Receptors Determine Hormone 
Selectivity;' EMBO J (July 1991) 10: 1885-1890; and Segaloffet al.. "Structure of the 
Lutropin/Choriogonadotropin Receptor." Recent Prog. Horm. Res. (1990) 46: 261-301. 

5 

Summary of the Invention 
Three novel mammalian G-protein coupled receptors having extra-cellular leucine 
rich repeat domains, i.e. LGR4, LGR5 and LGR7, and polypeptide compositions related 
thereto, as well as nucleotide compositions encoding the same, are provided. The subject 

10 proteins, polypeptide and nucleic acid compositions find use in a variety of different 

applications, including the identification of homologous or related genes; the production 
of compositions that modulate the expression or function of the subject proteins; in the 
identification of endogenous ligands for the subject orphan receptors; in the generation of 
functional binding proteins for the neutralization of the actions of endogenous ligands; in 

15 gene therapy; in mapping functional regions of the protein; and in studying associated 
physiological pathways. In addition, modulation of the gene activity in vivo is used for 
prophylactic and therapeutic purposes, and the like. 

Brief Description of the Figures 
20 Fig. 1 provides the nucleotide and amino acid sequence for human LGR4. 

Fig. 2 provides the nucleotide and amino acid sequence for human LGR5. 
Fig. 3 provides the nucleotide and amino acid sequence for human LGR7, long 

form. 

Fig. 4 provides the nucleotide and amino acid sequence for human LGR7, short 

25 form. 

Fig. 5 provides an alignment comparison of the long and short forms of LGR7. 
Figs. 6 provides a comparison of deduced amino acid sequence of LGR4 and 5 
cDNAs and those encoding FSH and LH receptors. 
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DESCRIPTION OF THL SPECIFIC hMBODIMENTS 

Novel mammalian G-protein coupled receptors having extra-cellular leucine rich 
repeat regions (i.e. LGR4. LGR5 and LGR7) and polypeptide compositions related 
thereto, as well as nucleic acid compositions encoding the same, are provided. The 
5 subject polypeptide and/or nucleic acid compositions find use in a variety of different 
applications, including the identification of homologous or related genes: for the 
identification of endogenous ligands for these novel receptors; the production of 
compositions that modulate the expression or function of the receptors; for gene therapv: 
for mapping functional regions of the receptors: in studying associated physiological 
10 pathways; for in vivo prophylactic and therapeutic purposes: as immunogens for producing 
antibodies; in screening for biologically active agents; and the like. 

Before the subject invention is further described, it is to be understood that the 
invention is not limited to the particular embodiments of the invention described below, as 
1 5 variations of the particular embodiments may be made and still fall within the scope of the 
appended claims. It is also to be understood that the terminology employed is for the 
purpose of describing particular embodiments, and is not intended to be limiting. Instead, 
the scope of the present invention will be established by the appended claims. 

20 In this specification and the appended claims, the singular forms "a,"' "an," and 

"the" include plural reference unless the context clearly dictates otherwise. Unless defined 
otherwise, all technical and scientific terms used herein have the same meaning as 
commonly understood to one of ordinary skill in the art to which this invention belongs. 

25 Characterization of LGR4. LGR5 and LGR7 

LGR4, LGR5 and LGR7 are novel mammalian receptors of the G-protein coupled, 
seven trans-membrane family of proteins, specifically the subfamily of G-protein coupled 
seven trans-membrane proteins which are characterized by the presence of extra-cellular 
leucine rich repeat regions. As such, these proteins have trans-membrane segments and 

30 extra-cellular regions similar to those found in the known LH, FSH, and TSI1 receptors. In 
other words, these proteins have both a G-protein coupled seven trans-membrane region 
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and a leucine rich repeat extra-cellular domain. The N-terminal extra-cellular domains of 
these proteins also show high homology with Drosophila Slit and Toll proteins having 
leucine rich repeats. These proteins are expressed in diverse tissues. 

The human LGR4 gene has a nucleotide sequence as shown in SEQ ID NO:01 . 
5 The human LGR4 gene product has an amino acid sequence as shown in SEQ ID NO:02. 
LGR4 is expressed in a plurality of different tissue types, including ovary, testis, adrenal, 
placenta, liver, kidney and intestine. 

The human LGR5 gene has a nucleotide sequence as shown in SEQ ID NO: 03. 
The LGR5 gene product has an amino acid sequence as shown in SEQ ID NO:04. LGR5 
10 has been found to be mainly expressed in muscle, placenta and spinal cord tissue. 

The human LGR7 gene encodes multiple splicing variants, each of which contains 
a multitude of cysteine-rich low density lipoprotein (LDL) binding motifs at the N- 
terminus in addition to the luecine rich repeat region. The longer forms of LGR-7 have a 
higher similarity than shorter froms of LGR-7 to snail LGR in the trans-membrane domain 
1 5 and the N-terminal LDL binding domain. The overall structure of both the long and short 
forms of LGR-7 is similar to that of the LH receptor. The human LGR7 short form gene 
has a nucleotide sequence as shown in SEQ ID NO:05. The LGR7 short form gene 
product has an amino acid sequence as shown in SEQ ID NO:06. The human LGR7 long 
form gene has a nucleotide sequence as shown in SEQ ID NO:07. The LGR7 long form 
20 gene product has an amino acid sequence as shown in SEQ ID NO:08. LGR7 is expressed 
in multiple tissues, including testis, ovary, prostate, intestine and colon. 

Identification of LGR4, LGR5 and LGR7 Sequences 
Homologs of LGR4, LGR5 and LGR7 are identified by any of a number of 
25 methods. A fragment of the provided cDNA may be used as a hybridization probe against 
a cDNA library from the target organism of interest, where low stringency conditions are 
used. The probe may be a large fragment, or one or more short degenerate primers. 

Nucleic acids having sequence similarity are detected by hybridization under low 
stringency conditions, for example, at 50°C and 6xSSC (0.9 M sodium chloride/0.09 M 
30 sodium citrate) and remain bound when subjected to washing at 55°C in lxSSC (0.15 M 
sodium chloride/0.015 M sodium citrate). Sequence identity may be determined by 

4 
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hybridization under stringent conditions, lor example, at 50°C or higher and 0.1 xSSC < 1 5 
mM sodium chloride.'Ol .5 mM sodium citrate}. Nucleic acids having a region of 
substantial identity to the provided LGR4. LGR5 and or LGR~ sequences, e.g. allelic 
variants, genetically altered versions of the gene. etc.. bind to the provided sequences 
under stringent hybridization conditions. By using probes, particularly labeled probes of 
DNA sequences, one can isolate homologous or related genes. The source of homologous 
genes may be any species, e.g., primate species, particularly human; rodents, such as rats 
and mice; canines; felines; bovines; ovines; equines; yeast; nematodes; etc. 

Between mammalian species, e.g., human and mouse, homologs have substantial 
sequence similarity, e.g. at least 75% sequence identity, usually at least 90%, more usually 
at least 95% between nucleotide sequences. Sequence similarity is calculated based on a 
reference sequence, which may be a subset of a larger sequence, such as a conserved 
motif, coding region, flanking region, etc. A reference sequence will usually be at least 
about 1 8 nt long, more usually at least about 30 nt long, and may extend to the complete 
sequence that is being compared. Algorithms for sequence analysis are known in the art, 
such as BLAST, described in Altschul et al (1990), J. Mol. Biol. 215:403-10. Unless 
specified otherwise, all sequence analysis numbers provided herein are as determined with 
the BLAST program using default settings. The sequences provided herein are essential 
for recognizing LGR4, LGR5 and Z,G7?7-related and homologous proteins in database 
searches. 

LGR4, LGR5 and LGR7 nucleic acid compositions 
Nucleic acids encoding LGR4, LGR5 and LGR7 may be cDNA or genomic DNA 
or a fragment thereof. The terms "LGR4 gene, " , 'LGR5 gene " and "LGR7 gene" shall be 
intended to mean the open reading frame encoding specific LGR4, LGR5 and LGR7 
polypeptides, and LGR4, LGR5 and LGR7 introns. as well as adjacent 5' and 3' non-coding 
nucleotide sequences involved in the regulation of expression, up to about 20 kb beyond 
the coding region, but possibly further in either direction. The gene may be introduced 
into an appropriate vector for extra-chromosomal maintenance or for integration into a 
host genome. 
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The term "cDNA" as used herein is intended to include all nucleic acids that share 
the arrangement of sequence elements found in native mature mRNA species, where 
sequence elements are exons and 3' and 5' non-coding regions. Normally mRNA species 
have contiguous exons, with the intervening introns. when present, removed by nuclear 
5 RN A splicing, to create a continuous open reading frame encoding an LGR4. LGR5 and 
LGR7 protein. 

A genomic sequence of interest comprises the nucleic acid present between the 
initiation codon and the stop codon. as defined in the listed sequences, including all of the 
introns that are normally present in a native chromosome. It may further include the 3' 

1 0 and 5' untranslated regions found in the mature mRNA. It may further include specific 
transcriptional and translational regulator} 7 sequences, such as promoters, enhancers, etc., 
including about 1 kb, but possibly more, of flanking genomic DNA at either the 5 ' or 3' 
end of the transcribed region. The genomic DNA may be isolated as a fragment of 100 
kbp or smaller; and substantially free of flanking chromosomal sequence. The genomic 

1 5 DNA Hanking the coding region, either 3' or 5', or internal regulatory sequences as 
sometimes found in introns, contains sequences required for proper tissue and stage 
specific expression. 

The sequence of the 5' flanking region may be utilized for promoter elements, 
including enhancer binding sites, that provide for developmental regulation in tissues 

20 where LGR4, LGR5 and/or LGR7 is expressed. The tissue specific expression is useful for 
determining the pattern of expression, and for providing promoters that mimic the native 
pattern of expression. Naturally occurring polymorphisms in the promoter region are 
useful for determining natural variations in expression, particularly those that may be 
associated with disease. 

25 Alternatively, mutations may be introduced into the promoter region to determine 

the effect of altering expression in experimentally defined systems. Methods for the 
identification of specific DNA motifs involved in the binding of transcriptional factors are 
known in the art, e.g. sequence similarity to known binding motifs, gel retardation studies, 
etc. For examples, see Blackwell et al. (1995), Moi Med. 1:194-205; Mortlock et al. 

30 (1996), Genome Res. 6:327-33; and Joulin and Richard-Foy (1995), Eur. J. Biochem. 
232:620-626. 
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The regulatory sequences may be used to identify ci.s acting sequences required tor 
transcriptional or translational regulation of LGR4. LGR5 and/or LGR" expression, 
especially in different tissues or stages of development, and to identify cis acting 
sequences and //WM-acting factors that regulate or mediate LGR4. LGR and' or LGR~ 
5 expression. Such transcription or translational control regions may be operably linked to 
an LGR4. LGR5 or LGR~ gene in order to promote expression of wild type or altered 
LGR4. L3R5 or LGR7 or other proteins of interest in cultured cells, or in embryonic, fetal 
or adult tissues, and for gene therapy. 

The nucleic acid compositions of the subject invention may encode all or a part of 
1 0 the sub ject polypeptides. Double or single stranded fragments mny be obtained of the 
DNA sequence by chemically synthesizing oligonucleotides in accordance with 
conventional methods, by restriction enzyme digestion, by PCR amplification, etc. For 
the most part. DNA fragments will be of at least 15 nt. usually at least 18 nt or 25 nt. and 
may be at least about 50 nt. Such small DNA fragments are useful as primers for PCR. 
1 5 hybridization screening probes, etc. Larger DNA fragments. /. e. greater than 1 00 nt are 
useful for production of the encoded polypeptide. For use in amplification reactions, such 
as PCR. a pair of primers will be used. The exact composition of the primer sequences is 
not critical to the invention, but for most applications the primers will hybridize to the 
subject sequence under stringent conditions, as known in the art. It is preferable to choose 
10 a pair of primers that will generate an amplification product of at least about 50 nt. 
preferably at least about 100 nt. Algorithms for the selection of primer sequences are 
generally known, and are available in commercial software packages. Amplification 
primers hybridize to complementary strands of DNA. and will prime towards each other. 
The LGR4. LGR and LGR 7 genes are isolated and obtained in substantial purity. 
!5 generally as other than an intact chromosome. Usually, the DNA will be obtained 

substantially free of other nucleic acid sequences that do not include an LGR4. LGR5 or 
LGR" sequence or fragment thereof, generally being at least about 50%. usually at least 
about 90% pure and are typically "recombinant", i.e. flanked by one or more nucleotides 
with which it is not normally associated on a naturally occurring chromosome. 
0 The DNA may also be used to identify expression of the gene in a biological 

specimen. The manner in which one probes cells for the presence of particular nucleotide 



WO 99/48921 



PCT/US99/06573 



sequences, as genomic DNA or RNA. is well established in the literature and does not 
require elaboration here. DNA or mRNA is isolated from a cell sample. The mRNA may 
be amplified by RT-PCR. using reverse transcriptase to form a complementary DNA 
strand, followed by polymerase chain reaction amplification using primers specific for the 
5 subject DNA sequences. Alternatively, the mRNA sample is separated by gel 

electrophoresis, transferred to a suitable support, e.g. nitrocellulose, nylon, etc.. and then 
probed with a fragment of the subject DNA as a probe. Other techniques, such as 
oligonucleotide ligation assays, in situ hybridizations, and hybridization to DNA probes 
arrayed on a solid chip may also find use. Detection of mRNA hybridizing to the subject 
1 0 sequence is indicative of LGR4, LGR5 and/or LGR7 gene expression in the sample. 

The sequence of an LGR4, LGR5 or LGR7 gene, including flanking promoter 
regions and coding regions, may be mutated in various ways known in the art to generate 
targeted changes in promoter strength, sequence of the encoded protein, etc. The DNA 
sequence or protein product of such a mutation will usually be substantially similar to the 
1 5 sequences provided herein, i.e. will differ by at least one nucleotide or amino acid, 

respectively, and may differ by at least two but not more than about ten nucleotides or 
amino acids. The sequence changes may be substitutions, insertions, deletions, or a 
combination thereof Deletions may further include larger changes, such as deletions of a 
domain or exon. Other modifications of interest include epitope tagging, e.g. with the 
20 FLAG system, HA, etc. For studies of subcellular localization, fusion proteins with green 
fluorescent proteins (GFP) may be used. 

Techniques for in vitro mutagenesis of cloned genes are known. Examples of 
protocols for site specific mutagenesis may be found in Gustin et al (1993). 
Biotechniques 14:22; Barany (1985), Gene 37:1 1 1-23; Colicelli et al. (1985), Mol Gen. 
25 Genet. 199:537-9; and Prentki et al (1984), Gene 29:303-13. Methods for site specific 
mutagenesis can be found in Sambrook et al, Molecular Cloning: A Laboratory Manual. 
CSH Press 1989, pp. 15.3-15.108; Weiner et al. (1993), Gene 126:35-41 ; Savers et al. 
(1992), Biotechniques 13:592-6; Jones and Winistorfer (1992), Biotechniques 12:528-30; 
Barton et al (1 990), Nucleic Acids Res 18:7349-55; Marotti and Tomich (1 989), Gene 
30 Anal Tech. 6:67-70; and Zhu (1989), Anal Biochem 177:120-4. Such mutated genes may 
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be used to study structure-function relationships of LGR4. LGR5 and'or LGR7. or to alter 
properties of the protein that affect its function or regulation. 

LGR4. LGR5 and LGR7 Polypeptides 
5 Also provided by the subject invention are LGR4, LGR5 and LGR7 polypeptide 

compositions. The term polyeptide composition as used herein refers to both the full 
length proteins as well as portions or fragments thereof. Also included in this term are 
variations of the naturally occurring proteins, where such variations are homologous or 
substantially similar to the naturally occurring protein, be the naturally occurring protein 

10 the human protein, mouse protein, or protein from some other species which naturally 
expresses an LGR4, LGR5 or LGR7 protein, usually a mammalian species. A candidate 
homologous protein is substantially similar to an LGR4. LGR5 or LGR7 protein of the 
subject invention, and therefore is an LGR4, LGR5 or LGR7 protein of the subject 
invention, if the candidate protein has a sequence that has at least about 80%, usually at 

15 least about 90% and more usually at least about 98% sequence identity with an LGR4, 
LGR5 or LGR7 protein, as measured by BLAST, supra. In the following description of 
the subject invention, the term "LGR4, LGR5 or LGR7-protein" is used to refer not only 
to the human LGR4, LGR5 or LGR7 protein, but also to homologs thereof expressed in 
non-human species, e.g. murine, rat and other mammalian species. 

20 The subject gene may be employed for producing all or portions of LGR4. LGR5 

and LGR7 polypeptides. By "LGR4 polypeptide/protein", "LGR5 polypeptide/protein," 
and "LGR7 polypeptide/protein" is meant an amino acid sequence encoded by an open 
reading frame (ORF) of LGR4, LGR5 and LGR7 genes, including the full-length native 
polypeptide and fragments thereof, particularly biologically active fragments and/or 

25 fragments corresponding to functional domains, e.g. extra-cellular regions; and including 
fusions of the subject polypeptides to other proteins or parts thereof, e.g. chimeric 
proteins. For expression, an expression cassette may be employed. The expression vector 
will provide a transcriptional and translational initiation region, which may be inducible or 
constitutive, where the coding region is operably linked under the transcriptional control 

30 of the transcriptional initiation region, and a transcriptional and translational termination 
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■egion. These control regions may be native to an LGR4. LGR5 or LGR~zene* or may be 
derived from exogenous sources. 

Expression vectors generally have convenient restriction sites located near the 
promoter sequence to provide for the insertion of nucleic acid sequences encoding 
5 heterologous proteins. A selectable marker operative in the expression host may be 

present. Expression vectors may be used for the production of fusion proteins, where the 
exogenous fusion peptide provides additional functionality, i.e. increased protein 
synthesis, stability, reactivity with defined antisera, an enzyme marker, e.g. 
(3-galactosidase, etc. 

1 0 Expression cassettes may be prepared comprising a transcription initiation region, 

the gene or fragment thereof, and a transcriptional termination region. Of particular 
interest is the use of sequences that allow for the expression of functional epitopes or 
domains, usually at least about 8 amino acids in length, more usually at least about 15 
ammo acids in length, to about 25 amino acids, and up to the complete open reading frame 

1 5 of the gene. After introduction of the DNA, the cells containing the construct may be 

selected by means of a selectable marker, the cells expanded and then used for expression. 

LGR4, LGR5 or LGR7 polypeptides may be expressed in prokaryotes or 
eukaryotes in accordance with conventional ways, depending upon the purpose for 
expression. For large scale production of the protein, a unicellular organism, such as E. 

20 coll B. subtilis, S. cerevisiae, insect cells in combination with baculovirus vectors, or cells 
of a higher organism such as vertebrates, particularly mammals, e.g. COS 7 cells, may be 
used as the expression host cells. In some situations, it is desirable to express the LGR4, 
LGR5 or LGR7 gene in eukaryotic cells, where the LGR4, LGR5 or LGR7 protein will 
benefit from native folding and post-translational modifications. Small peptides can also 

25 be synthesized in the laboratory. Polypeptides that are subsets of the complete LGR4, 
LGR5 or LGR7 sequence may be used to identify and investigate parts of the protein 
important for function or to raise antibodies directed against these regions. 

For production of the extracellular domain of the LGR4. LGR5 or LGR7 receptor, 
the anchored receptor approach as described in Osuga et al, Mol. Endocrinol. (1997) 1 1 : 

30 1659-1668 may be employed. Likewise, the chimeric receptor approach described in Kudo 
et al. J Biol. Chem. (1996) 271; 22470-22478 may be used. 

10 
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Such peptides find use in the identification of endogenous ligands and in drug 
screening for agonists and atangonists using methods described in Osuga. supra. 
Solubilized extracellular domains find use as therapeutic agents, e.g. in the neutralization 
of the action of endogenous ligands. 
5 With the availability of the protein or fragments thereof in large amounts, by 

employing an expression host, the protein may be isolated and purified in accordance with 
conventional ways. A lysate may be prepared of the expression host and the lysate 
purified using HPLC, exclusion chromatography, gel electrophoresis, affinity 
chromatography, or other purification technique. The purified protein will generally be at 

10 least about 80% pure, preferably at least about 90% pure, and may be up rn and including 
100% pure. Pure is intended to mean free of other proteins, as well as cellular debris. 

The expressed LGR4. LGR5 and LGR7 polypeptides are useful for the production 
of antibodies, where short fragments provide for antibodies specific for the particular 
polypeptide, and larger fragments or the entire protein allow for the production of 

1 5 antibodies over the surface of the polypeptide. Antibodies may be raised to the wild-type 
or variant forms of LGR4. LGR5 or LGR7. Antibodies may be raised to isolated peptides 
corresponding to these domains, or to the native protein. 

Antibodies are prepared in accordance with conventional ways, where the 
expressed polypeptide or protein is used as an immunogen, by itself or conjugated to 

20 known immunogenic carriers, e.g. KLH. pre-S HBsAg, other viral or eukaryotic proteins, 
or the like. Various adjuvants may be employed, with a series of injections, as 
appropriate. Both polyclonal and monoclonal antibodies may be produced. For 
monoclonal antibodies, after one or more booster injections, the spleen is isolated, the 
lymphocytes immortalized by cell fusion, and then screened for high affinity antibody 

25 binding. The immortalized cells, i.e. hybridomas, producing the desired antibodies may 
then be expanded. For further description, see Monoclonal Antibodies: A Laboratory 7 
Manual , Harlow and Lane eds.. Cold Spring Harbor Laboratories, Cold Spring Harbor. 
New York, 1988. If desired, the mRNA encoding the heavy and light chains may be 
isolated and mutagenized by cloning in E. coli* and the heavy and light chains mixed to 

30 further enhance the affinity of the antibody. Alternatives to in vivo immunization as a 
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method of raising antibodies include binding to phage "displa>" libraries, usually in 
conjunction with in vitro affinity maturation. 

Diagnostic Uses 

5 The subject nucleic acid and/or polypeptide compositions may be used to analyze a 

patient sample for the presence of polymorphisms associated with a disease state or 
genetic predisposition to a disease state. Biochemical studies may be performed to 
determine whether a sequence polymorphism in an LGR4. LGR or Z,G7? 7coding region or 
control regions is associated with disease. Disease associated polymorphisms may include 

1 0 deletion or truncation of the gene, mutations that alter expression level, that affect the 
activity of the protein, and the like. 

Changes in the promoter or enhancer sequence that may affect expression levels of 
LGR4, LGR5 or LGR7 can be compared to expression levels of the normal allele by 
various methods known in the art. Methods for determining promoter or enhancer 

15 strength include quantitation of the expressed natural protein; insertion of the variant 
control element into a vector with a reporter gene such as pl-galactosidase, luciferase, 
chloramphenicol acetyltransferase, etc, that provides for convenient quantitation; and the 
like. 

A number of methods are available for analyzing nucleic acids for the presence of 
20 a specific sequence, e.g. a disease associated polymorphism. Where large amounts of 

DNA are available, genomic DNA is used directly. Alternatively, the region of interest is 
cloned into a suitable vector and grown in sufficient quantity for analysis. Cells that 
express LGR4, LGR5 or LGR7 may be used as a source of mRNA, which may be assayed 
directly or reverse transcribed into cDNA for analysis. The nucleic acid may be amplified 
25 by conventional techniques, such as the polymerase chain reaction (PCR), to provide 

sufficient amounts for analysis. The use of the polymerase chain reaction is described in 
Saiki, et al. (1985), Science 239:487. and a review of techniques may be found in 
Sambrook, et al. Molecular Cloning: A Laboratory- Manual . CSH Press 1989, pp. 14.2- 
14.33. Alternatively, various methods are known in the art that utilize oligonucleotide 
30 ligation as a means of detecting polymorphisms, for examples see Riley et al (1 990), 
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Xuci. Acids Res. 18:2887-2800; and Delahunty ct al. < 19Q6), Am. J. Hum. Genet. 58:1239- 
1246. 

A detectable label may be included in an amplification reaction. Suitable labels 
include fluorochrornes. e.g. fluorescein isothiocyanate (FITC), rhodamine, Texas Red. 
5 phycoerythrin, allophycocyanin. b-carboxyiluorescein (6-FAM). 2\7 r -dimethoxy-4 , .5'- 
dichloro-6-carboxyfluorescein (JOE). 6-carboxy-X-rhodamine (ROX), 6-carboxy- 
2\4\7\4,7-hexachlorofluorescein (HEX). 5-carboxyfluorescein (5-FAM) or N.N,N',N'- 
tetramethyl-6-carboxyrhodamine (TAMRA), radioactive labels, e.g. 32 P. 1S S, 'H; etc. The 
label may be a two stage system, where the amplified DNA is conjugated to biotin, 

\KJ liUpLWHJ, <^L^. IIUVUI^ U 111L.I1 UlllllllV L'lliVAUA^, 111^.1, U . ^ . UUUlli. jpCWllt 1 i L 1 D 0 G I C S , ClC, 

where the binding partner is conjugated to a detectable label. The label may be 
conjugated to one or both of the primers. Alternatively, the pool of nucleotides used in the 
amplification is labeled, so as to incorporate the label into the amplification product. 

The sample nucleic acid, e.g. amplified or cloned fragment, is analyzed by one of a 

15 number of methods known in the art. The nucleic acid may be sequenced by dideoxy or 
other methods, and the sequence of bases compared to a wild-type LGR4, LGR5 or LGR7 
sequence. Hybridization with the variant sequence may also be used to determine its 
presence, by Southern blots, dot blots, etc. The hybridization pattern of a control and 
variant sequence to an array of oligonucleotide probes immobilized on a solid support, as 

20 described in US 5,445,934. or in WO 95/35505 (the disclosures of which are herein 
incorporated by reference ), may also be used as a means of detecting the presence of 
variant sequences. Single strand conformational polymorphism (SSCP ) analysis, 
denaturing gradient gel electrophoresis (DGGE), and heteroduplex analysis in gel matrices 
are used to detect conformational changes created by DNA sequence variation as 

25 alterations in electrophoretic mobility. Alternatively, where a polymorphism creates or 
destroys a recognition site for a restriction endonuclease. the sample is digested with that 
endonuclease, and the products size fractionated to determine whether the fragment was 
digested. Fractionation is performed by gel or capillary electrophoresis, particularly 
acrylamide or agarose gels. 

30 Screening for mutations in LGR4, LGR5 or LGR7 may be based on the functional 

or antigenic characteristics of the protein. Protein truncation assays are useful in detecting 
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deletions that may affect the biological activity of the protein. Various immunoassays 
designed to detect polymorphisms in LGR4, LGR5 or LGR7 proteins may be used in 
screening. Where many diverse genetic mutations lead to a particular disease phenotype, 
functional protein assays have proven to be effective screening tools. The activity of the 
5 encoded LGR4, LGR5 or LGR7 protein may be determined by comparison with the wild- 
type protein. 

Antibodies specific for LGR4, LGR5 or LGR7 proteins may be used in staining or 
in immunoassays. Samples, as used herein, include biological fluids such as semen, 
blood, cerebrospinal fluid, tears, saliva, lymph, dialysis fluid and the like; organ or tissue 
10 culture derived fluids; and fluids extracted from physiological tissues. Also included in 
the term are derivatives and fractions of such fluids. The cells may be dissociated, in the 
case of solid tissues, or tissue sections may be analyzed. Alternatively a lysate of the cells 
may be prepared. 

Diagnosis may be performed by a number of methods to determine the absence or 

15 presence or altered amounts of normal or abnormal LGR4, LGR5 or LGR7 in patient 
cells. For example, detection may utilize staining of cells or histological sections, 
performed in accordance with conventional methods. Cells are permeabilized to stain 
cytoplasmic molecules. The antibodies of interest are added to the cell sample, and 
incubated for a period of time sufficient to allow binding to the epitope, usually at least 

20 about 10 minutes. The antibody may be labeled with radioisotopes, enzymes, fluorescers, 
chemiluminescers, or other labels for direct detection. Alternatively, a second stage 
antibody or reagent is used to amplify the signal. Such reagents are well known in the art. 
For example, the primary antibody may be conjugated to biotin, with horseradish 
peroxidase-conjugated avidin added as a second stage reagent. Alternatively, the 

25 secondary antibody conjugated to a flourescent compound, e.g. fluorescein, rhodamine, 
Texas red, etc. Final detection uses a substrate that undergoes a color change in the 
presence of the peroxidase. The absence or presence of antibody binding may be 
determined by various methods, including flow cytometry of dissociated cells, 
microscopy, radiography, scintillation counting, etc. 

30 Diagnostic screening may also be performed for polymorphisms that are 

genetically linked to a disease predisposition, particularly through the use of microsatellite 
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markers or single nucleotide polymorphisms. Frequently the microsatcllite polymorphism 
itself is not phenotypically expressed, but is linked to sequences that result in a disease 
predisposition. However, in some cases the microsatellite sequence itself may affect gene 
expression. Microsatellite linkage analysis may be performed alone, or in combination 
with direct detection of polymorphisms, as described above. The use of microsatellite 
markers for genotyping is well documented. For examples, see Mansfield et al (1994), 
Genomics 24:225-233; Ziegle et al. (1992), Genomics 14:1026-1031; Dib et al, supra. 

Modulation of LGR4, LGR5 and LGR7 Gene Expression 
The LGR4, LUR5 or LGR7 genes, gene fragments, or the LGR4, LGR5 or LGR7 
protein or protein fragments, are useful in gene therapy to treat disorders associated with 
LGR4, LGR5 or LGR7 defects. Expression vectors may be used to introduce the LGR4, 
LGR5 or LGR7 gene into a cell. Such vectors generally have convenient restriction sites 
located near the promoter sequence to provide for the insertion of nucleic acid sequences. 
Transcription cassettes may be prepared comprising a transcription initiation region, the 
target gene or fragment thereof, and a transcriptional termination region. The 
transcription cassettes may be introduced into a variety of vectors, e.g. plasmid; retrovirus, 
e.g. lentivirus; adenovirus; and the like, where the vectors are able to transiently or stably 
be maintained in the cells, usually for a period of at least about one day, more usually for a 
period of at least about several days to several weeks. 

The gene or LGR4, LGR5 or LGR7 protein may be introduced into tissues or host 
cells by any number of routes, including viral infection, microinjection, or fusion of 
vesicles. Jet injection may also be used for intramuscular administration, as described by 
Furth et al. (1992), Anal Biochem 205:365-368. The DNA may be coated onto gold 
microparticles, and delivered intradermally by a particle bombardment device, or "gene 
gun" as described in the literature (see, for example. Tang et al (1992), Nature 
356:152-154), where gold microprojectiles are coated with the LGR4, LGR5 or LGR7 
DNA, then bombarded into skin cells. 

Antisense molecules can be used to down-regulate expression of LGR4, LGR5, or 
LGR7 in cells. The anti-sense reagent may be antisense oligonucleotides (ODN), 
particularly synthetic ODN having chemical modifications from native nucleic acids, or 
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nucleic acid constructs that express such anti-sense molecules as RNA. The antisense 
sequence is complementary to the mRNA of the targeted gene, and inhibits expression of 
the targeted gene products. Antisense molecules inhibit gene expression through various 
mechanisms, e.g. by reducing the amount of mRNA available for translation, through 
5 activation of RNAse H, or steric hindrance. One or a combination of antisense molecules 
may be administered, where a combination may comprise multiple different sequences. 

Antisense molecules may be produced by expression of all or a part of the target 
gene sequence in an appropriate vector, where the transcriptional initiation is oriented 
such that an antisense strand is produced as an RNA molecule. Alternatively, the 

1 0 antisense molecule is a synthetic oligonucleotide. Antisense oligonucleotides will 
generally be at least about 7, usually at least about 12, more usually at least about 20 
nucleotides in length, and not more than about 500, usually not more than about 50, more 
usually not more than about 35 nucleotides in length, where the length is governed by 
efficiency of inhibition, specificity, including absence of cross-reactivity, and the like. It 

15 has been found that short oligonucleotides, of from 7 to 8 bases in length, can be strong 
and selective inhibitors of gene expression (see Wagner et al. (1996), Nature BiotechnoL 
14:840-844). 

A specific region or regions of the endogenous sense strand mRNA sequence is 
chosen to be complemented by the antisense sequence. Selection of a specific sequence 

20 for the oligonucleotide may use an empirical method, where several candidate sequences 
are assayed for inhibition of expression of the target gene in an in vitro or animal model. 
A combination of sequences may also be used, where several regions of the mRNA 
sequence are selected for antisense complementation. 

Antisense oligonucleotides may be chemically synthesized by methods known in 

25 the art (see Wagner et al (1 993), supra, and Milligan et al., supra.) Preferred 

oligonucleotides are chemically modified from the native phosphodiester structure, in 
order to increase their intracellular stability and binding affinity. A number of such 
modifications have been described in the literature, which alter the chemistry of the 
backbone, sugars or heterocyclic bases. 

30 Among useful changes in the backbone chemistry are phosphorothioates; 

phosphorodithioates, where both of the non-bridging oxygens are substituted with sulfur; 
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phosphoroarnidites; alkyl phosphotnesters and boranophosphatcs. Achiral phosphate 
derivatives include 3'-0'-5 , -S-phosphorothioate. 3'-S-5'-0-phosphorothioate. 3'-CH : -5*-0- 
phosphonate and 3'-NH-5 , -()-phosphoroamidate. Peptide nucleic acids replace the entire 
ribose phosphodiester backbone with a peptide linkage. Sugar modifications are also used 
5 to enhance stability and affinity. The a-anomer of deoxyribose may be used, where the 
base is inverted with respect to the natural P-anomer. The 2'-OH of the ribose sugar may 
be altered to form 2'-0-methyl or 2'-0-allyl sugars, which provides resistance to 
degradation without comprising affinity. Modification of the heterocyclic bases must 
maintain proper base pairing. Some useful substitutions include deoxyuridine for 

10 deoxythymidine: 5-methyl-2'-deoxycytidine and 5-bromo-7'-denxyrytid!nc for 

deoxycytidine. 5- propyn\i-2'-deoxyuridine and 5-propynyl-2'-deoxycytidine have been 
shown to increase affinity and biological activity when substituted for deoxythymidine 
and deoxycytidine. respectively. 

As an alternative to anti-sense inhibitors, catalytic nucleic acid compounds, e.g. 

1 5 ribozymes, anti-sense conjugates, etc. may be used to inhibit gene expression. Ribozymes 
may be synthesized in vitro and administered to the patient, or may be encoded on an 
expression vector, from which the ribozyme is synthesized in the targeted cell (for 
example, see International patent application WO 9523225, and Beigelman et al (1995), 
Suci. Acids Res. 23:4434-42). Examples of oligonucleotides with catalytic activity are 

20 described in WO 9506764. Conjugates of anti-sense ODN with a metal complex, e.g. 
terpyndylCufll ). capable of mediating mRNA hydrolysis are described in Bashkin et al. 
(\995).Appl. Biochem. Biotechnol 54:43-56. 

Genetically Altered Cell or Animal Models for LGR4, LGR5 and LGR7 
25 Function 

The subject nucleic acids can be used to generate transgenic, non-human animals 
or site specific gene modifications in cell lines. Transgenic animals may be made through 
homologous recombination, where the normal LGR4. LGR5 or LGR7 locus is altered. 
30 Alternatively, a nucleic acid construct is randomly integrated into the genome. Vectors 
for stable integration include plasmids, retroviruses and other animal viruses. YACs, and 
the like. 
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The modified cells or animals are useful in the study ofLGR4, LGR5 and/or LGR7 
function and regulation, f or example, a series of small deletions and/or substitutions may 
be made in the host's native LGR4, LGR5 or LGR~ gene to determine the role of different 
exons. Of interest is the use of LGR4, LGR5 or LGR7 to construct transgenic animal 
5 models for disease states. Specific constructs of interest include anti-sense LGR4, LGR5 
or LGR 7, which will block LGR4, LGR5 or LGR7 expression, expression of dominant 
negative LGR4. LGR5 or LGR7 mutations, and over-expression of LGR4, LGR5 or LGR7 
genes. Where an LGR4, LGR5 or LGR7 sequence is introduced, the introduced sequence 
may be either a complete or partial sequence of an LGR4, LGR5 or LGR7 gene native to 
1 0 the host, or may be a complete or partial LGR4, LGR5 or LGR7 sequence that is 
exogenous to the host animal, e.g., a human LGR4. LGR5 or LGR 7 sequence. A 
detectable marker, such as lac Z may be introduced into the LGR4, LGR5 or LGR7 locus, 
where upregulation of LGR4, LGR5 or LGR7 expression will result in an easily detected 
change in phenotype. 

1 5 One may also provide for expression of the LGR4, LGR5 or LGR7 gene or variants 

thereof in cells or tissues where it is not normally expressed, at levels not normally present 
in such cells or tissues, or at abnormal times of development. By providing expression of 
LGR4, LGR5 or LGR7 protein in cells in which it is not normally produced, one can 
induce changes in cell behavior, e.g. through LGR4, LGR5 or LGR7 mediated activity. 

20 DNA constructs for homologous recombination will comprise at least a portion of 

the LGR4. LGR5 or LGR7 gene, which may or may not be native to the species of the 
host animal, wherein the gene has the desired genetic modification^ ), and includes 
regions of homology to the target locus. DNA constructs for random integration need not 
include regions of homology to mediate recombination. Conveniently, markers for 

25 positive and negative selection are included. Methods for generating cells having targeted 
gene modifications through homologous recombination are known in the art. For various 
techniques for transfecting mammalian cells, see Keown et al. (1990), Meth Enzymol. 
185:527-537. 

For embryonic stem (ES) cells, an ES cell line may be employed, or embryonic 
30 cells may be obtained freshly from a hosf e.g. mouse, rat. guinea pig, etc. Such cells are 
grown on an appropriate fibroblast-feeder layer or grown in the presence of leukemia 
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inhibiting factor ( LIF). When ES or embryonic ceils have been transformed, they may be 
used to produce transgenic animals. Alter transformation, the cells are plated onto a 
feeder layer in an appropriate medium. Cells containing the construct may be detected by 
employing a selective medium. After sufficient time for colonies to grow, they are picked 
5 and analyzed for the occurrence of homologous recombination or integration of the 

construct. Those colonies that are positive may then be used for embryo manipulation and 
blastocyst injection. Blastocysts are obtained from 4 to 6 week old superovulated females. 
The ES cells are trypsinized, and the modified cells are injected into the blastocoel of the 
blastocyst. After injection, the blastocysts are returned to each uterine horn of 

10 pseudopregnant females. Females are then allowed to go to term and the resulting 
offspring screened for the construct. By providing for a different phenotype of the 
blastocyst and the genetically modified cells, chimeric progeny can be readily detected. 

The chimeric animals are screened for the presence of the modified gene and males 
and females having the modification are mated to produce homozygous progeny. If the 

1 5 gene alterations cause lethality at some point in development, tissues or organs can be 
maintained as allogeneic or congenic grafts or transplants, or in in vitro culture. The 
transgenic animals may be any non-human mammal, such as laboratory- animals, domestic 
animals, etc. The transgenic animals may be used in functional studies, drug screening, 
etc.* e.g. to determine the effect of a candidate drug on LGR4, LGR5 or LGR7or related 

20 gene activation etc. 

In vitro models for LGR4, LGR5 or LGR7 Function 
The availability of a number of components in the G-protein coupled receptor 

family, as previously described, allows in vitro reconstruction of the processes or systems 
25 in which members of this family operate. Two or more of the components, such as the 

isolated receptor and a potential ligand therefore, may be combined in vitro, and the 

behavior assessed in terms of activation of transcription of specific target sequences; 

modification of protein components, e.g. proteolytic processing, phosphorylation. 

methylation, etc.: ability of different protein components to bind to each other. The 
30 components may be modified by sequence deletion, substitution, etc. to determine the 

functional role of specific domains. 
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Drug screening may be performed using an in vitro model, a genetically altered 
cell or animal, purified LGR4, LGR5 or LGR7 protein, as well as fragments or portions 
thereof, e.g. solubilized extra-cellular domain or chimeric receptor proteins comprising the 
LGR4. LGR5 or LGR7 extra-cellular domain. One can identify ligands or substrates that 
5 bind to and modulate the action of LGR4, LGR5 or LGR7. Areas of investigation include 
the development of agents that beneficially counter abnormalities related to LGR4, LGR5 
or LGR7 and the use of such agents in the therapy. 

Drug screening identifies agents that modulate the activity of LGR4. LGR5 or 
LGR7 function in abnormal cells. Of particular interest are screening assays for agents 

1 0 that have a low toxicity for human cells. A wide variety of assays may be used for this 

purpose, including labeled in vitro protein-protein binding assays, electrophoretic mobility 
shift assays, immunoassays for protein binding, and the like. The purified protein may 
also be used for determination of three-dimensional crystal structure, which can be used 
for modeling intermolecular interactions, such as GTP binding, etc. 

1 5 The term "agent" as used herein describes any molecule, e.g. protein or 

pharmaceutical, with the capability of altering or mimicking the physiological function of 
LGR4, LGR5 or LGR7. Generally a plurality of assay mixtures are run in parallel with 
different agent concentrations to obtain a differential response to the various 
concentrations. Typically, one of these concentrations serves as a negative control, i.e. at 

20 zero concentration or below 7 the level of detection. 

In some embodiments, candidate agents encompass numerous chemical classes, 
though typically they are organic molecules, preferably small organic compounds having a 
molecular weight of more than 50 and less than about 2,500 daltons. Candidate agents 
comprise functional groups necessary for structural interaction with proteins, particularly 

25 hydrogen bonding, and typically include at least an amine, carbonyl, hydroxyi or carboxyl 
group, preferably at least two of the functional chemical groups. The candidate agents 
often comprise cyclical carbon or heterocyclic structures and/or aromatic or polyaromatic 
structures substituted with one or more of the above functional groups. Candidate agents 
are also found among biomolecules including peptides, saccharides, fatty acids, steroids, 

30 purines, pyrimidines, derivatives, structural analogs or combinations thereof 
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Candidate agents are obtained from a wide variety ot sources including libraries of 
synthetic or natural compounds. I ; or example, numerous means are available tor random 
and directed synthesis ot a wide variety of organic compounds and biomolecules, 
including expression of randomized oligonucleotides and oligopeptides. Alternatively, 
5 libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts 
are available or readily produced. Additionally, natural or synthetically produced libraries 
and compounds are readily modified through conventional chemical, physical and 
biochemical means, and may be used to produce combinatorial libraries. Known 
pharmacological agents may be subjected to directed or random chemical modifications, 
10 such as acvlation, alkvlation. esterification, amidification. etc to produce structural 
analogs. 

Of particular interest in certain embodiments are peptidic agents based on LGR4, 
LGR5 or LGR7. e.g. solubilized extra-cellular domain or chimeric receptor proteins 
comprising the LGR4, LGR5 or LGR7 extra-cellular domain, where such agents 

15 neutralize the activity of endogenous LGR4, LGR5 or LGR7 ligands, e.g. hormones. 

Where the screening assay is a binding assay, one or more of the molecules may be 
joined to a label, where the label can directly or indirectly provide a detectable signal. 
Various labels include radioisotopes, fluorescers. chemiluminescers, enzymes, specific 
binding molecules, particles, e.g. magnetic particles, and the like. Specific binding 

20 molecules include pairs, such as biotin and streptavidin, digoxin and antidigoxin etc. For 
the specific binding members, the complementary member would normally be labeled 
with a molecule that provides for detection, in accordance with known procedures. 

A variety of other reagents may be included in the screening assay. These include 
reagents like salts, neutral proteins, e.g. albumin, detergents, etc., that are used to facilitate 

25 optimal protein-protein binding and/or reduce non-specific or background interactions. 
Reagents that improve the efficiency of the assay, such as protease inhibitors, nuclease 
inhibitors, anti-microbial agents, etc.. may be used. The mixture of components are added 
in any order that provides for the requisite binding. Incubations are performed at any- 
suitable temperature, typically between 4 and 40°C. Incubation periods are selected for 

30 optimum activity, but may also be optimized to facilitate rapid high-throughput screening. 
Typically between 0.1 and 1 hours will be sufficient. 
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Other assays of interest detect agents that mimic LGR4. LGR5 or LGR7 function. 
For example, an expression construct comprising an LGR4. LGR5 or LGR7 gene mav be 
introduced into a cell line under conditions that allow expression. The level of LGR4, 
^GR5 or LGR7 activity is determined by a functional assay, as previously described. In 
5 one screening assay, the ability of candidate agents to inhibit or enhance LGR4. LGR5 or 
LGR7 function is determined. Alternatively, candidate agents are added to a cell that 
lacks functional LGR4. LGR5 or LGR7, and screened for the ability to reproduce LGR4, 
LGR5 or LGR7 activity in a functional assay. 

The compounds having the desired pharmacological activity may be administered 
10 in a physiologically acceptable earner to a host for treatment, etc. The compounds may 
also be used to enhance LGR4, LGR5 or LGR7 function The inhibitory agents may be 
administered in a variety of ways, orally, topically, parenterally e.g. subcutaneously, 
intraperitoneal^, by viral infection, intravascularly. etc. Topical treatments are of 
particular interest. Depending upon the manner of introduction, the compounds may be 
1 5 formulated in a variety of ways. The concentration of therapeutically active compound in 
the formulation may vary from about 0.1-100 wt.%. 

The pharmaceutical compositions can be prepared in various forms, such as 
granules, tablets, pills, suppositories, capsules, suspensions, salves, lotions and the like. 
Pharmaceutical grade organic or inorganic carriers and/or diluents suitable for oral and 
20 topical use can be used to make up compositions containing the therapeutically-active 
compounds. Diluents known to the art include aqueous media, vegetable and animal oils 
and fats. Stabilizing agents, wetting and emulsifying agents, salts for varying the osmotic 
pressure or buffers for securing an adequate pH value, and skin penetration enhancers can 
be used as auxiliary agents. 

25 

Experimental 

The following examples are put forth so as to provide those of ordinary skill in the 
art with a complete disclosure and description of how to make and use the subject 
invention, and are not intended to limit the scope of what is regarded as the invention. 
30 Efforts have been made to ensure accuracy with respect to the numbers used (e.g. 

amounts, temperature, concentrations, etc.) but some experimental errors and deviations 
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should be allowed lor. I "nless otherwise indicated, parts are parts by weight, molecular 
weight is average molecular weight, temperature is in degrees centigrade; and pressure is 
at or near atmospheric. 

5 Example 1. Identification of LCR4 and LGR5 

Human sequences related to the sea anemone and Drosophila glycoprotein 
hormone receptors were identified from the expression sequence tag database (dbEST) at 
the National Center for Biotechnology Information by using the BLAST server with the 
BLOSUM62 protein comparison matrix (Altschul SF et aL Nucleic Acids Res (1997) 

10 25:3389-3402). Human ESTs showing high homology to two non-overlapping regions of 
the gonadotropin receptors were identified. Clones AA3 12798 and AA298810 were 
found to encode transmembrane four to five of the putative receptor LGR4 whereas 
AA460529 and AA424098 encode transmembrane two to three of the putative receptor 
LGR5. Using these ESTs to further search the GenBank EST division database, 

15 overlapping EST sequences were aligned to obtain the longest open reading frame (ORF) 
for these receptors. 

Based on the longest human ORF. specific primers were designed for PCR 
amplification of LGR4 and LGR5 cDNA fragments from rat ovary and human placenta, 
respectively. After hybridization with labeled EST clones and confirmation of DNA 

20 sequences by dideoxy DNA sequencing, specific receptor fragments isolated were used to 
design primers to prepare sub-cDNA libraries enriched with specific receptor cDNAs. For 
5 ; extension, reverse transcription was performed using rat ovarian and human placenta 
mRNA preparations and receptor-specific primers. Following second strand synthesis, the 
enriched cDNA pool was tailed at 5 '-ends with specific adaptor sequences to allow further 

25 PCR amplification. For 3 ' extension, rat ovarian or human placenta mRNAs were 

reversed transcribed using oligo-dT. followed by second strand synthesis using receptor- 
specific primers and adaptor tailing. These mini-libraries were further used as templates 
for PCR amplification of upstream or downstream cDNAs specific for each receptor using 
internal primers. PCR products with a strong hybridization signal to each receptor cDNA 

30 fragment were subcloned into the pUC18 or pcDNA3 vectors. After screening of these 
sublibraries based on colony hybridization using specific receptor probes, clones with 5'- 
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or 3 '-sequences of the putative receptors were identified and isolated tor DNA 
sequencing. As needed, the procedure was repeated up to three times to generate cDNAs 
encoding the complete ORF of each putative receptor for sequence analysis and for the 
expression of receptor proteins in eukaryotic cells. The entire coding sequences of each 
gene were also amplified with specific primers flanking the entire ORF in independent 
experiments. At least three independent PCR clones were sequenced to verify the 
authenticity of coding sequences. The nucleotide sequence of LGR4, as well as the amino 
acid sequence of the product encoded by the ORF thereof, is provided in Fig. 1. The 
nucleotide sequence of LGR5, as well as the amino acid sequence of the product encoded 
by the ORF thereof, is provided in Fig. 2. 

Example 2. Comparison of deduced amino acid sequence of LGR4 and 5 cDNAs 
and those encoding FSH and LH receptors. 

Sequence alignment of LGR4 and LGR5 with known human glycoprotein 
hormone receptors was performed and the results are shown in Fig. 6. Shaded residues are 
identical in at least two of the four receptor proteins shown. 

Example 3. Expression pattern of LGR4 and 5 raRNA transcripts in different 
tissues. 

For northern blot analysis, poly (A)+-selected RNA from different human tissues 
was hybridized with a 32 P-labeled cDNA probes. After washing, the blots were exposed to 
X-ray films at -70C for five days. Subsequent hybridization with a beta-actin cDNA 
probe was performed to estimate nucleic acid loading (8 h exposure). LGR4 was shown to 
be expressed in placenta, ovary, testis, adrenal, spinal cord, thyroid, stomach, trachea, 
heart, pancreas, kidney, prostate and spleen while LGR5 was shown to be expressed in the 
skeletal muscle, placenta, spinal cord, brain, adrenal, colon, stomach, ovary and bone 
marrow. 
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Example 4. Chromosomal localization of LGK4 and 5 in human 

Using genomic fragments of LGR4 ( MOO Kb) and LGR5 (>I00 Kb) as probes, 
chromosomal localization of these genes were detected using the FISH method to banded 
DNA in chromosomal 5q34-35.1 and 12ql5. respectivcK . 

5 

Example 5. identification of LGR7. 

Analysis of EST databases has revealed a novel LGR closely related to a G protein- 
coupled receptor from pond snail (Lymnaca stagnalis, accession no. 481 c )46). Because the 
snail G-protein coupled receptor shared the leucine-rich repeat ectodomain and seven 

10 transmembrane region characteristics of mammalian I GRs, the novel EST sequence could 
encode either a homologue of snail receptor or a novel mammalian LGR. For the isolation of 
LGR7 cDNA. a Clontech Marathon-ready testis cDNA pool was used as the template for 5' 
and 3' RACE with adapter and gene-specific primers. Sequence analysis of the RACE 
products showed that LGR7 gene encode at least two splicing variants differ at the N- 

15 terminus. The nucleotide sequence of the long variant, as well as the amino acid sequence of 
the product encoded by the ORF thereof, is provided in Fig. 3; while the nucleotide sequence 
of the short variant, as well as the amino acid sequence of the ORF thereof, is provided in Fig. 
4. Both variants contain a classical C-terminal 7-transmembrane region and a leucine-rich 
repeat ectodomain flanked by cysteine rich regions found in other mammalian LGRs. The 

20 long form LGR7 contains extra 35 amino acids in the N-terminal cysteine rich region as 

compared to the short form LGR7. Of interest, analysis of the LGR7 ORF from either variant 
showed that its tertiary structure resembles that of mammalian LGRs instead of the snail 
receptor, which shares the greatest identity in the transmembrane region. These findings 
suggest that LGR7 and snail receptor diverged early during evolution and LGR7 perhaps 

25 adopted new function in higher organisms. 

Based on the LGR7 cDNA sequence, we further identified a human genomic DNA 
fragment (AQ053279) in the genomic survey sequence division of GenBank that contains 
part of the LGR7 gene. The authenticity of this genomic clone was confirmed by Southern 
blot hvbridization and the genomic clone was used as the probe to identify the 

30 chromosomal localization for LGR7 gene. 
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It is evident from the above discussion and results that three novel mammalian G- 
protein coupled receptors, as well as a nucleic acids encoding the same, are provided by 
the subject invention. The inventions described above find use in a variety of applications, 
including research and therapeutic applications. 

5 

All publications and patent applications cited in this specification are herein 
incorporated by reference as if each individual publication or patent application were 
specifically and individually indicated to be incorporated by reference. The publications 
discussed herein are provided solely for their disclosure prior to the filing date of the 
1 0 present application. Nothing herein is to be construed as an admission that the invention 
is not entitled to antedate such a disclosure by virtue of prior invention. 

Although the foregoing invention has been described in some detail by way of 
illustration and example for purposes of clarity of understanding, it will be readily 
1 5 apparent to those of ordinary skill in the art in light of the teachings of this invention that 
certain changes and modifications may be made thereto without departing from the spirit 
or scope of the appended claims. 
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What is Claimed is: 

1. An isolated nucleic acid encoding a mammalian protein selected from the 
group consisting of LGR4. LGR5 or LGR7. 

5 

2. An isolated nucleic acid according to Claim 1, wherein said mammalian 
protein has the amino acid sequence of SEQ ID NO:2, SEQ ID NO:04. SEQ ID NO:06 or 
SEQ IDNO:08. 

10 3 A^n isolated nucleic acid accordm° to Claim 1 wherein said rnaTiimahan 

protein has an amino acid sequence that is substantially identical to the ammo acid 
sequence of SEQ ID NO:2. SEQ ID NO:04, SEQ ID NO:06 or SEQ ID NO:08. 

4. An isolated nucleic acid according to Claim 1. wherein the nucleotide 
15 sequence of said nucleic acid has the sequence selected from the group consisting of: (a) 
SEQ ID NO:l or the complementary sequence thereof; (b) SEQ ID NO:03 or the 
complementary sequence thereof; (c) SEQ ID NO:05 or the complementary sequence 
thereof; and (d) SEQ ID NO:07 or the complementary sequence thereof. 

20 5. An isolated nucleic acid comprising at least 18 contiguous nucleotides of 

the sequence selected from the group consisting of: (a) SEQ ID NO:l or the 
complementary sequence thereof; (b) SEQ ID NO:03 or the complementary sequence 
thereof; (c) SEQ ID NO:05 or the complementary sequence thereof; and (d) SEQ ID 
NO:07 or the complementary' sequence thereof. 

25 

6. An isolated nucleic acid comprising at least 50 contiguous nucleotides of 
the sequence selected from the group consisting of: (a) SEQ ID NO:l or the 
complementary sequence thereof; (b) SEQ ID NO:03 or the complementary sequence 
thereof; (c) SEQ ID NO:05 or the complementary sequence thereof; and (d) SEQ ID 
30 NO:07 or the complementary sequence thereof 
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7. An isolated nucleic acid that hybridizes under stringent conditions to a 
nucleic acid having the nucleotide sequence selected from the group consisting of: (a) 
SEQ ID NO:l or the complementary sequence thereof; (b) SEQ ID NO:03 or the 
complementary sequence thereof; (c) SEQ ID NO:05 or the complementary sequence 
thereof; and (d) SEQ ID NO:07 or the complementary sequence thereof. 

8. An expression cassette comprising a transcriptional initiation region 
functional in an expression host, a nucleic acid having a sequence of the isolated nucleic 
acid according to Claim 1 under the transcriptional regulation of said transcriptional 
initiation region, and a transcriptional termination region functional in said expression 
host. 

9. A cell comprising an expression cassette according to Claim 8 as part of an 
extrachromosomal element or integrated into the genome of a host cell as a result of 
introduction of said expression cassette into said host cell, and the cellular progeny of said 
host cell. 

10. A method for producing a mammalian protein selected from the group 
consisting of LGR4, LGR5 and LGR7. said method comprising: 

growing a cell according to Claim 9. whereby said mammalian protein is 
expressed; and 

isolating said protein substantially free of other proteins. 

11. A purified polypeptide composition comprising at least 50 weight % of the 
protein present as a mammalian protein selected from the group consisting of LGR4, 
LGR5 and LGR7, or a fragment thereof. 

12. An antibody binding specifically to a mammalian protein selected from the 
group consisting of LGR4, LGR5 and LGR7. 
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13. The antibody of Claim 12. wherein said antibody is a monoclonal antibody. 

14. A non-human transgenic animal model tor LGR4. LGR5 or LGR~* gene 
Junction, wherein said transgenic animal comprises an introduced alteration in an LGR4. 
LGR5 or LGR~ gene. 

15. The animal model of claim 14, wherein said animal is heterozygous for 
said introduced alteration. 

1 (j The animal model of claim 14. wherein said animal is homozygous tor said 
introduced alteration. 

17. The animal model of claim 14. wherein said introduced alteration is a 
knockout of endogenous LGR4, LGR5 or LGR 7 gene expression. 

18. A method of screening a sample for the presence of a ligand for a receptor 
selected from the group consisting of LGR4. LGR5 and LGR7. said method comprising: 

contacting said sample with a receptor selected from the group consisting 
of LGR4. LGR5 and LGR7or a mimetic thereof, and 

detecting the presence of a binding event between said receptor and ligand 

in said sample. 
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>LGR4 nucleotide sequence (SEQ ID NO. 01) 

'-ACAGCT7 3ACGGAAGTG777G7GCG7CCCCTCAGCAAC7TGCCAAC7CTCCAGGCGCT3ACC77GGC7^^^ 

T C AAG C AT C C C T G A C T T C G C T T T C A - 7 0 AAC 7 T T T 3 AAG C T T G G T G G T T C T G C A 7 C T G C AT AACr AT AAAAT T AAAAG C C 1 C 

AGTCAACACTCTTTTGAToGACTAGATAAC 7TGGAAACCTTGGACTTGAA7TACAAATTAGTTGGATG/v3TTTCCTCAGGC7 

^TTAAAGCC7T77CCAGC77TAAAGAGC7GGGA7TTCACAGTAATTCTA7TTC7G7T;^^ 

AATCCACTG 37 AAGAACTATT CATTTCTATCATAATCCT 3TGT 3TTTT GT7GGG;AA i C7CAGCATTT'~7-^AAC ( ^7G^ r T ^AT 
CTGCATTGCTTAGTCATTCGTKTG^ 

ACCTTGACAGGGACAAAAATAAGCAGCATAC 3TGA.TGATCT 3TGCCAAAA37CAJAAAGATGCTGAGGACTCTGGACTTATi'"T 

tat aac aat at aag ag acctt 3caagttt7aatggtt gt 7gtgca77tggaagaggattt 7at7 gcagtc'i aat "^aaat^t^ 
ctaataaaggaaaj\ta/CT7ttcajh73Ccta^^ 

ag t gg a, 3 3 7 7 7 t 3cg aag c 7 tg g 3 ata a t 7 act aac g 7 g g atg t aag 7 t tc aat g aat t aag 7 7 7 a7 t 7 7 c t ac ggaag g c 
ctaaatgggctcaatcaactaaa3ct7gtgg7taacttcaagc7 3aaagac3cc7tg7cac ^agagarttt^taat^ 

A GGTGT :7A7CA 3TA7CA7A 7 3CTTA7CAG7 3TT 3TGCATTTTG 3GGGTGT GAG 7 77TTATG 3AAA7T AAACA CAGAAGA7 
AACAGCCC3CAAGAACA3AGTGT3ACAAAAGAGAAAGGTGCT^^ 

G ^: A ^^ 

^ i i .T7A^A jTGTGGTT 3 AT 777 337GG7 3 3 7CTTGC7TT7CAA "773 7TTGT 3ATTTTAACAGTGTTT ^C>jTCT m "^ 
T 7ATCA7T 3CCT 3 7CTCCAAA 3T 3TT 3ATAGGCTTGATTTCTGT 3TCTAA3TTATTCATGGGCAT7TATA. 3TGGCAT 3CTT 
A3TTTT3T7GA7G3TGT3TG3-GG3GG3GA777GCG3AA7T7 3GGATT7 3 3T3GGAAAC7 3GGAGCGG:7 3CAAGGTAGG3 
GGG7CT:TGGCA3TCT77T:G7CAGAGAG:GCTG7ATTGC7AT7AAGAG7 3GCA3C7GTGGAAAv3.AA.GGGTA"7TG 
3ATTTGATGAAA:ACGGGAAGAG3ACT:A7::7CAGACAG777GAGGTG3G:g:C37CTTAGC7T7GCTGGGTGCG3GAGTG 
GGAGGC7 3CTT 37 3CCTTTTCCACGGAGGGCAATATT7TGCATCGCC7TT 37G7TT 3CGGT777 77ACA3GAGAAAC 3C 3A 
T7GTTAGGATT7A77G7GA7 37TA3T3 3TAT7AAACT7ACTGG7ATTTTTA37AA7GGC 3A77AT 3TA7A 3TAAACTATAC 
TGCAACTTAGAGAAGGAGGACCTGTCGGAAAA3T 7 3CAGTCTAGCGTGATT AAG-^A'" GT7Gr , rTGG r T "•^ T C rT1 ' T1 ^A^AAA 
^CAT77TC7737G3CC7G7TG3ATT7TT777AT7737ACCATTGA^^ 

T ;TGTT,iCACTGATATTGT7CCCGTTG Z 7TG7TT 3 3 7T 3AATCC 3GTCCTGTAT 37TTTCC7CAATC7AAAG7TTAAAGAA 
SACTGGAAGCTACTGAAGCGGCGTGTTACCAGGAAACA 

GAACAGGATTTC7ACTATGACTGTGGCATGTAT7 7C7ACTTGCAGGGTAA3C7GAC7G7C7G7GACTG TTGTGAGTCA777 
CTTTTGACAAAACCAGTATCATGCAAACACTTAATAA^^ 

AGCTCTGACCAGGTGCAGGCC7GTGGA03AGC7T ^37TCTACCAGAG7CG7GGA77CCC r 777GGTGC3^TATG7TTACAA7 
G7AGAGAGAGTCAGAGA7TGA 



>LGR4 amino acid sequence (SEQ ID NO: 02) 

MPGPLGLLCFLA1 GhLGSAGPSG^J'-.PPLCAT-.PCSCDGD'-.FVECSGKGLTAVPEGLS/^FTOALDI SMNniTijLFEDAFKS FP 
F 7 E E L 0 L A 3 N D L 7 L I K P rCA L G G LK E I/r 7 r L 7 1 N(, ■ L FT 7 F S E AT K G L S AL C S L P I T 'AN H I T 7 7 P E D S EE G I ,VQ L F\H 7 W LED 
UGLTEYF YF PLSNLPTLC/ALTLALNN I SG I F L FAFTNL3SLYYLHLKNMK2 KS LS£ HCFDGLDNLETGDLIJYl; YLDEFFQA 
I PLAL PS LKEL.G FH SNS SV I PDGAFGGEJF'LLPTI H LYDNFLE EVGI]£AFHt"LSDLr:CL\ r Z P.GAS LVQWFPULTGTVHLESI. 
^TCTKICSIPDELCQNCK^LFTLLTP 

SGAFArCI.-GTITNLDYSFNELTSFPTE -L^ , GL^JvLPX\'3KFKLFvTA*LAARC , EAKLF.GLSVPYAYQGCA.FWGCD3LCKLtI T E r 

I J S PC'E H 7 Y T K E KG AT D AAN YT S 7 AE N E E H S ~ I I I P 3 T F S T G A Fr ' P C E Y L L 3 S WM I P IT YW F I F L V AL L FM L L V I L T ' 7 ^ A ° C 
SS L PAS KL EI 3LISYSNLLMGI YTG I LTFL3 AVSW 3F FAEFGIWWETGSG SPr^AGSLAYFSSESAYFLLTLAAVEFSVF^P 

^MKHGECGHLFTFQVAALLALLGAAV 

CNLEKEEL3ENS ^SSYI KHVAK7 I FTMC I FFCF VAFFSFAPLI TAI S IS F EIMKS YTLI FFPLPACLKPYLYYFFNPPTFE. 
EWKLLPCRF VTRKHGSVSVS I SS'^GGCGECC'FYYDGGMYSHLC : GKLTVCr C7ES FLITKF'VSCFCHLI FCCHSCPYLTAA.SCC-F 
PEAYWSDCGTQSAHSDYADEEPSFYSCSSLvYCAGGFACFYIGFGFPLVPYAYNLQRVFD 
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Nucleotide sequence of LGR5 (total 2082 nucleotides) ( SEQ ID NO: 03) 

CTACATCTCCATAACAATAGAATCCACTCCCTGGGAAAGAAATGCTTTGATGGGCTCCACAGCCTAGAGACTTTAGATTTA 
AATTACAATAACCTTGATGAATTCCCCACTGCAATTAGGACACTCTCCAACTTAAAGGAACTAGGATTTCATAGCAACAAT 
ATCAGGTCGATACCTGAGAAAGCATTTGTAGGCAACCCTTCTCTTATTACAATACATTTCTATGACAATCCCATCCAATTT 
GTTGGGAGATCTGCTTTTCAACATTTACCTGAACTAAGAACACTGACTCTGAATGGTGCCTCACAAATAACTGAATTTCCT 
GATTTAACTGGAACTGCAAACCTGGAGAGTCTGACTTTAACTGGAGCACAGATCTCATCTCTTCCTCAAACCGTCTGCAAT 
CAGTTACCTAATCTCCAAGTGCTAGATCTGTCTTACAACCTATTAGAAGATTTACCCAGTTTTTCAGTCTGCCAAAAGCTT 
CAGAAAATTGACCTAAGACATAATGAAATCTACGAAATTAAAGTTGACACTTTCCAGCAGTTGCTTAGCCTCCGATCGCTG 
AATTTGGCTTGGAACAAAATTGCTATTATTCACCCCAATGCATTTTCCACTTTGCCATCCCTAATAAAGCTGGACCTATCG 
TCCAACCTCCTGTCGTCTTTTCCTATAACTGGGTTACATGGTTTAACTCACTTAAAATTAACAGGAAATCATGCCTTACAG 
AGCTGGATATCATCTGAAAACTTTCCAGAACTCAAGGTXATAGAAATGCCTTATGCTTACCAGTGCTGTGCATTTGGAGTG 
TGTGAGAATGCCTATAAGATTTCTAATCAATGGAATAAAGGTGACAACAGCAGTATGGACGACCTTCATAAGAAAGATGCT 
GGAATGTTTCAGGCTCAAGATGAACGTGACCTTGAAGATTTCCTGCTTGACTTTGAGGAAGACCTGAAAGCCCTTCATTCA 
GTGCAGTGTTCACCTTCCCCAGGCCCCTTCAAACCCTGTGAACACCTGCTTGATGGCTGGCTGATCAGAATTGGAGTGTGG 
ACCATAGCAGTTCTGGCACTTACTTGTAATGCTTTGGTGACTTCAACAGTTTTCAGATCCCCTCTGTACATTTCCCCCATT 
AAACTGTT^TTGGGGTCATCGCAGCAGTGAACATGCTCACGGGAnTrTrrAGTGrrGTGCTGGCTGGTGTGGATGCGTTC 
ACTTTTGGCAGCTTTGCACGACATGGTGCCTGGTGGGAGAATGGGGTTGGTTGCCATGTCATTGGTTTTTTGTCCATTTTT 
GCTTCAGAATCATCTGTTTTCCTGCTTACTCTGGCAGCCCTGGAGCGTGGGTTCTCTGTGAAATATTCTGCAAAATTTGAA 
ACGAAAGCTCCATTTTCTAGCCTGAAAGTAATCATTTTGCTCTGTGGCCTGCTGGCCTTGACCATGGCCGCAGTTCCCCTG 
CTGGGTGGCAGCAAGTATGGCGCCTCCCCTCTCTGCCTGCCTTTGCCTTTTGGGGAGCCCAGCACCATGGGCTACATGGTC 
GCTCTCATCTTGCTCAATTCCCTTTGCTTCCTCATGATGACCATTGCCTACACCAAGCTCTACTGCAATTTGGACAAGGGA 
GACCTGGAGAATATTTGGGACTGCTCTATGGTAAAACACATTGCCCTGTTGCTCTTCACCAACTGCATCCTAAACTGCCCT 
GTGGCTTTCTTGTCCTTCTCCTCTTTAATAAACCTTACATTTATCAGTCCTGAAGTAATTAAGTTTATCCTTCTGGTGGTA 
GTCCCACTTCCTGCATGTCTCAATCCCCTTCTCTACATCTTGTTCAATCCTCACTTTAAGGAGGATCTGGTGAGCCTGAGA 
AAGCAAACCTACGTCTGGACAAGATCAAAACACCCAAGCTTGATGTCAATTAACTCTGATGATGTCGAAAAACAGTCCTGT 
GACTCAACTCAAGCCTTGGTAACCTTTACCAGCTCCAGCATCACTTATGACCTGCCTCCCAGTTCCGTGCCATCACCAGCT 
TATCCAGTGACTGAGAGCTGCCATCTTTCCTCTGTGGCATTTGTCCCATGTCTCTAA 



>amino acid sequence of LGR5 (total 693 amino acids) (SEQ ID NO: 04) 

LHLHNNRIHSLGKXCFDGLHSLETLDLNYN^ 
VGRSAFQHLPEIjRTLTLNGASQITEFPDLTGTANLESLTL^ 

qkidlrhneiyeikvdtfqqllslrslniawnkiaiihpnafstlpsl^ 
swissenfpelkviempyayqccafgvcenaykisnqwnkgdnss^dl^ 
vqcspspgpfkpcehlldgwlirigwtiavi^tcnalvtstvfrsplyispikxligviaavnm 
tfgs farhgawwengvgchvigfls I fasessvflltlj^ 

LGGSKYGASPLCLPLPFGEPSTMGYMVALILI^SLCFL^^ 

VAFLSFSSLINLTFISPEVIKFILLVWPLPACI^PLLYILFNPHFKEDLVSLRKQTYWTRSKHPSLMSINSDDVEKQSG 
DSTQALVTFTSSSITYDLPPSSVPSPAYPVTESCHLSSVAFVPCL 
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>FmaI LG R7 (LGR7-Lori£ variant) full length sequence (2467 nt) (SEQ ID NO:05>. 

GAAAGGAGGPJ^GAAAAAAAGAGGAATGGAAAGAGACAGAGAAAGGAAATGGGAGTGGAAGGAGGGAGGACTGCTTT 

GTAACTGCTAAGATTGCAGACAGAAATAG CACACAA3CACTGTGAGCTGTATGCGATTCAGAAACCAAGACCAAATT 

TTGCTCACTTTCATTAATCAGTTGCTCAGATAGAAGGAAATGACATCTGGTTCTGTCTTCTTCTACATCTTAATTTT 

TGGAAAATATTTTTCTCATGGGGGTGGACAGGATGTCAAGTGCTCCCTTGGCTATTTCCCCTGTGGGAACATCACAA 

AGTGCTTGCCTCAGCTCCTGCACTGTAACGGTGTGGACGACTGCGGGAATCAGGCCGATGAGGACAACTGTGGAGAC 

AACAATGGATGGTCCATGCAATTTGACAAATATTTTGCCAGTTACTACAAAATGACTTCCCAATATCCTTTTGAGGC 

AGAAACACCTGAATGTTTGGTCGGTTCTGTGCCAGTGCAATGTCTTTGCCAAGGTCTGGAGCTTGACTGTGATGAAA 

CCAATTTACGAGCTGTTCCATCGGTTTCTTCAAATGTGACTGCAATGTCACTTCAGTGGAACTTAATAAGAAAGCTT 

CCTCCTGATTGCTTCAAGAATTATCATGATCTTCAGAAGCTGTACCTGCAAAACAATAAGATTACATCCATCTCCAT 

CTATGCTTTCAGAGGACTGAATAGCCTTACTAAACTGTATCTCAGTCATAACAGAATAACCTTCCTGAAGCCGGGTG 

TTTTTGAAGATCTTCACAGACTAGAATGGCTGATAATTGAAGATAATCACCTCAGTCGAATTTCCCCACCAACATTT 

TATGGACTAAATTCTCTTATTCTCTTAGTCCTGATGAATAACGTCCTCACCCGTTTACCTGATAAACCTCTCTGTCA 

ACACATGCCAAGACTACATTGGCTGGACCTTGAAGGCAACCATATCCATAATTTAAGAAATTTGACTTTTATTTCCT 

GCAGTAATTTAACTGTTTTAGTGATGAGGAAAAACAAAATTAATCACTTAAATGAAAATACTTTTGCACCTCTCCAG 

AAACTGGATGAATTGGATTTAGGAAGTAATAAGATTGAAAATCTTCCACCGCTTATATTCAAGGACCTGAAGGAGCT 

GTCACAATTGAATCTTTCCTATAATCCAATCCAGAAAATTCAAGCAAACCAATTTGATTATCTTGTCAAACTCAAGT 

CTGTCAGCCTAGAAGGGATTGAAATTTCAAATATCCAACAAAGGATGTTTAGACCTCTTATGAATCTCTCTCACATA 

TATTTTAAGAAATTCCAGTACTGTGGGTATGCACCACATGTTCGCAGCTGTAAACCAAACACTGATGGAATTTCATC 

TCTAGAGAATCTCTTGGCAAGCATTATTCAGAGAGTATTTGTCTGGGTTGTATCTGCAGTTACCTGCTTTGGAAACA 

TTTTTGTCATTTGCATGCGACCTTATATCAGGTCTGAGAACAAGCTGTATGCCATGTCAATCATTTCTCTCTGCTGT 

G CCG ACTGCTTAATGGGAATATATTTATTCGTGATCGGAGGCTTTGACCTAAAGTTTCGTGGAGAATACAATAAGCA 

TGCGCAGCTGTGGATGGAGAGTACTCATTGTCAGCTTGTAGGATCTTTGGCCATTCTGTCCACAGAAGTATCAGTTT 

TACTGTTAACATTTCTGACATTGGAAAAATACATCTGCATTGTCTATCCTTTTAGATGTGTGAGACCTGGAAAATGC 

AGAACAATTACAGTTCTGATTCTCATTTGGATTACTGGTTTTATAGTGGCTTTCATTCCATTGAGCAATAAGGAATT 

TTTCAAAAACTACTATGGCACCAATGGAGTATGCTTCCCTCTTCATTCAGAAGATACAGAAAGTATTGGAGCCCAGA 

TTTATTCA3TGGCAATTTTTCTTGGTATTAATTTGGCCGCATTTATCATCATAGTTTTTTCCTATGGAAGCATGTTT 

TATAGTGTTCATCAAAGTGCCATAACAGCAACTGAAATACGGAATCAAGTTAAAAAAGAGATGATCCTTGCCAAACG 

TTTTTTCTTTATAGTATTTACTGATGCATTATGCTGGATACCCATTTTTGTAGTGAAATTTCTTTCACTGCTTCAGG 

TAGAAATACCAGGTACCATAACCTCTTGGGTAGTGATTTTTATTCTGCCCATTAACAGTGCTTTGT^ACCCAATTCTC 

TATACTCTGACCAC/xAGACCATTTAAAGAAATGATTCATCGGTTTTGGTATAACTACAGACAAAGAAAATCTATGGA 

CAGCAAAGGTCAGAAAACATATGCTCCATCATTCATCTGGGTGGAAATGTGGCCACTGCAGGAGATGCCACCTGAGT 

TAATGAAGCCGGACCTTTTCACATACCCCTGTGAAATGTCACTGATTTCTCAATCAACGAGACTCAATTCCTATTCA 
TGA 



-Final LGR7 (LGR7-long variant, total 757 amino acids)(SEQ ID NO:06) 

MTSGSVFFYILIFGKYFSHGGGQDVKCSLGYFPCGNITKCLPQLLHCNGVDDCGNQADEDNCGDNNGWSMQFDKYFA 

s yykmtsqypfeaetpeclvgsvpvqclcqgleldcdetnlravp 

LYLQimKITSISIYAFRGLNSLTKXYLSHNRITFLKPGVFEDLHRLEWLIIED^LSRISPPTFYGI^SLILLVLMN 
NVLTPXPDKPLCQHMPRLHWLDLEGNHIHNL^ 

NLPPLIFmLKELSQI^LSYNPIQKIQANQFDYLVKLKSLSLEGIEISNIQQRMFRPI^LSHIYFKKFQYCGYAPH 
VRSCKPNTDGISSLENLLAS I IQRVFVWVVSAVTCFGNIFVICMRPYIRSENKLYAMSI ISLCCADCLMGIYLFVIG 
GFDLKFRGEYNKHAQLWMESTHCQLVGSLAILST^ 

fivafiplsnkeffk3jyygtngvcfplhsedtesigaqiysvaiflgin1aafii ivfsygsmfysvhqsaitatei 
rnqvkkemi l^uo^fff i vftdalcw i p i fw^ 

rfwynyrqrp:smdskgqktyapsfiwvemwplqemppelmkpdlftypcemslisqstrlnsys* 
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-Fii ul L(iir (LGR7-Short variant) full length sequence (35K4 iit)(SFQ II) NO:07) 

— T 1 ^ CTTT^ ^ .r^vGTG ^TAA jnTTGCAortCA3rtj-iATAG rACAC.-i.- L3CACT3TGAGCTGTATGCGA.TT , CA3-^A.A , C , GAAGi\ 
3 C A. VAT T T T 3 C T C A 3 TT T 3 A T T .-J"-. T C A 3 T T 3 C T C AG A.T A G AA G 3 AAAT G A C AT C T 3 G T T C T G T 3 T T 3 T T 3 T A C AT G T 
T-f-iA. ""T ^ ^ ^ j'jAttAAT/i ^ TT ^ TCTC.n j. o- j ^'j ^TuuMCA^Grt i 3A.*vG . G 3 - G 3CTTGGCTATTT - -— CC 3T3 TGGbA/iL 
A.TCj iCAAAGTGCTTGCCTCAGCTCCTG 3ACTGTAACGGTGTGGACGACTG 3G 3GAATCAGGC 3G AT GAGG AC AACTG 
T3TC;GTG3TTTT3TGGGAGTGGATGTGTTT3GGAGGTGTGGA3GTTGAGTGGATGAAACCATTTA3GAGTGTTCCAT 
C 3GTTT CTTGAAlATGTGACTG 3AATGT TA CTTCAGTGGAACTTAATAA3AAA 3 CTT CCTCCT 3 ATT 3 3 TT CAAGAAT 
TATCAT3ATGTT CAGAAGCTGGACCTG 3AAAACAATAAGATTACAT CCATCT CCATCTATGCTTT 3 AG AG 3AGTGAA 
TAGCCTTAGTAAACTGTATCT CAGTCATAACAGAATAACCTT C 3TGAAGCCGGGTGTTTTTGAAGATCTT 3ACAGAC 
TAG AATGGCTGATAATTGAAGATAATC AC CTCAGTCGAATTT 3 CCC.ACC AACATTTT'ATGGACTAAATTCTCTT ATT 
GTCTTA3TCCTGATGAATAACGTCCTCACCCGTTTACCTGATAAACCTeTCTGTCAACACATGC3AAGACTA3ATTG 
GCTGGACCTTGAAGGCAACCATATCCATAATTTAAGAAATTTGACTTTTATTTCCTGCAGTAATTTAACTGTTTTAG 
TGATGA3GAAAAACAAAATTAATCACTTAAATGAAAATACTTTTGGACCTCTCCAGAAACTGGATGAATTG 3ATTTA 
GGAAGTAATAAGATTGAAAATCTTCCACCGCTTATATTCAAGGACCTGAAGGAGCTGTCACAATT 3AATCTTTCCTA 
TAATCCAATGCAGAAAATTCAAGCAAACCAATTTGATTATCTTGTCAAAGTCAAGTCTCTCAGC CTAGAAGGGATTG 
AAATTT3.AAATATCCAACAAAGGATGTTTAGACCTCTTATGAATCTCTCTCACATATATTTTAAGAAATTCCAGTAC 
TGTGGGTATGCACCACATGTTCGCAGCTGTAAACCAAACACTGATGGAATTTCATCTCTAGAGA.ATCTCTTGGCAAG 
3ATTATT3A3AGAGTATTTGTCTGGGTTGTATCTGCAGTTACGTG3TTTGGAAACATTTTTGTCATTT3CATGCGAC 
CTTATATCAGGTCTGAGAACAAGCTGTATGCCATGTCAATCATTTCTCTCTGCTGTGCCGACTG CTTAATGGGAATA 
TATTTATTCGTGATCGGAGGCTTTGACCTAAAGTTTCGTGGAGAATACAATAAGCATGCGCAGCTGTG3ATGGAGAG 
TACTCATTGTCAGCTTGTAGGATCTTTGGCCATTCTGTCCACAGAAGTATCAGTTTTACTGTTAACATTTCTGACAT 
TGGAAAA.ATACATCTGCATTGTCTATCCTTTTAGATGTGTGAGACCTGGAAAATGCAGAACAATTACA3TTCTGATT 
CTCATTT3GATTACTGGTTTTATAGTGGCTTTCATTCCATTGAGCAATAAGGAATTTTTCAAAAA3TACTATGGCAG 
CAATGGA jTATGCTTCCCTCTTCATTCAGAAGATACAGAAAGTATTGGAGCCCAGATTTATTCAGTGGCAATTTTTC 
TTGGTATTAATTTGGCCGCATTTATCATCATAGTTTTTTCCTATGGAAGCATGTTTTATAGTGTTCATGAAlAGTG C3 

ataaca 3caactgaaatacggaatcaagttaaaaaagagatgatccttgccaaacgttttttctttatagtatttac 
tgatgcattatgctggatacccatttttgtagtgaaatttctttcactgcttcaggtagaaataccaggtaccataa 
cctctt3ggtagtgatttttattctgcccattaacagtgctttgaacccaattctctatactctgaccacaagacca 
tttaaagaaatgattcatcggttttggtataactacagacaaagaaaatctatggacagcaaaggtcagaaaacata 
tgctccatcattcatctgggtggaaatgtggccactgcaggagatgccacctgagttaatgaagccggaccttttca 
cataccectgtgaaatgtcactgatttctcaatcaacgagactcaattcctattcatgactgactctgaaattcatt 
tcttcgcagagaatactgtgggggtgcttcatgagggatttactggtatgaaaatgaataccacaaaattaatttat 
aataataggtaagataaatattttacaaggacatgaggaaaaataaaaatgactaat^ 

ttatatcaataatgtatatatattagtagacattttgcataagaaattaagagaaatctacttcagtaacattcatt 
catttttctaacatgcatttattgagtacccactactatgtgcatagcattgcaatatagtcctggaagtagacagt 

GCAGAA33TTTCAATCTGTAGATAGTGTTTAATGACAAAAGACTATACAAAGTCCATCTGCAGTT3CTAGTTTAAAG 
TAGAGCTTTACCTGTCATGTGCATCAGCAAGAATCATAGGCACTTTTAAATAAAGGTTTAAAGTTTTG3AATACTCA 
GTGTATTTGCATCATAGAAAATGTCTGACTGTTTGCAAAATAATATTCTGTTTTAAGAATCCATCTTAGCTCTCTTT 
AAGTTTCCATACACTTGAGAGCCAACACAACATATTTATTACTAAAAAGATGCTTTGCTAGAAACTCAAAAACAGCA 
CTTCTTTT jGCACTTCCTGCCCAGTTTTCTCTTTGCTTTAAATGAACATCATCATATGGAATTGGAATAGGAGAGTA 

tgagtacg gcagagaagtggatcagaaaaactagaatgaggataaacatttacattagtggaaactcctgaaataaa 

tccttgta.ttgtcagttaactgattttcaacaaggatgccaagacaaaaaggcttttcaacaaacggtgctgtttta 

agaacagagctaagtggtttaattgacccactttagatgggtgaatgttatggtgtgtgaaatatgtgagtaaagca 

gttaaaaggaaaaagagctggaatgcactgattcaggaacttaatttcaggaaggaaaggtctgtatgtacacattt 

cacttt^jagcagaaaatctttcttcaagaaatgactttactttctctttgcactgccagcacgtgagatactaactt 

tttaactagttgttcttctctagtctctacgttattagnattttttgctttcataatgtgaaacctttaagcaggag 

aagaaaatgttttcagatagtttcaaatacnccaaaaatgtttgcaacacaaaaatactc 

ccttattgaatatatagttgtatagntttgttctgaaaaccc 

>Final LGR7-S ORF (722 ammo acids) (SEQ ID NO:08) 

mtsgsvf fvili fgkyfshgggqd^cslgyfpcgnitkclfqllhckg vt)dcgnqadednc^'\ t l j cqgm3lpgl j el 

dwmkpftsvpsvssijvtamslqwnlirrlppdcfk^ 

kpgvfedlhrlewliiednhlsrispptfyglnslillvlm^ 

fiscs^wl\wrkjjkinhl1ientfaplq 

klkslslegieisniqqrmfrplmnlshiyfkkfqycgyaph^^ 

fgnifvi cmrpy irs enkl yams 1 1 slccadclmgi ylfviggfdlkfrgeyiii-ckaolwmesthcolvgslailste 
vsvllltfltlekyicivypfrcvrpgkcrtitvliliwitgf i 
gaqiysyaiflginlaafiiivfsygsmfys\^qsaitate:?^qvkk^^ 

l l 0 v e i ? g t i t s ww ifilpiijs auj p i l y t l ttr p f keh i h r f wy7 jyr 0 r k s ntd s k g q kt y a p s f i wye ww p l q e m 
ppelmkfd-ftypcemslisqstrlnsys* 
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>Alignment of LGR7-L with LGR7-S 

Query=LGR7-L 

Sbjct=LGR7-S 

Query: 1 MTSGSVFFYILIFGKYFSHGGGQDVKCSLGYFPCGNITKCLPQLLHCNGVDDCGNQADED 6 0 

MTSGSVFFYILIFGKYFSHGGGQDVKCSLGYFPCGNITKCLPQLLHCNGVDDCGNQADED 
Sbjct : 1 MTSGSVFFYILIFGKYFSHGGGQDVKCSLGYFPCGNITKCLPQLLHCNGVDDCGNQADED 60 

Query : 6 1 NCGDNNGWSMQFDKYFAS YYKMTSQYPFEAETPECLVGSVPVQCLCQ GLELDCDETN 117 



Query: 118 LRAVPSVSSNVTAMSLQWNLIRKLPPDCFKNYHDLQICLYLQNNKITS IS IYAFRGLNSLT 177 

+VPSVSSNVTAMSLQWNLIRKLPPDCFKNYHDLQKI, LQNNKITS IS IYAFRGLNSLT 
Sbjct : 83 FTSVPSVSSNVTAMSLQWNLIRKLPPDCFKNYHDLQK^ 142 

Query: 178 KLYLSHNRITFLKPGVFEDLHRLEWLIIEDNHLSRISPPTFYGLNSLILLVIjyiNNVLTR^ 237 

KLYLSHNRITFLKPGVFEDLHRLEWLIIEDI^LSRISPPTFYGLNSLIIjLVLMNNVLTRL 
Sbjct : 143 KLYLSHNRITFLKPGVFEDLHRLEVJLIIEDNHLSRISPPTFYGIjNSLILLVIjMNNVLTRL 202 

Query: 238 PDKPLCQHMPRLHWLDLEGNHIHNLRNLTFISCSNLTVL^ 297 

PDKPLCQHMPRLHWLDLEGNHIHNLRNLTFISCSNLT^ 
Sbjct: 203 PDKPLCQHMPRLHWLDLEGimiHNLRNLTFISC^ 262 

Query: 2 98 DELJDLGSNXIENLPPLIFKDLKELSQLNLSYNPIQKIQANQFDYLVKLKSLSLEGIEISN 3 57 

DELDLGSlSTKIENLPPLIFKI)LKELSQLNLSYNPIQKIQANQFDYLVKIjKSIiSLEGIEISN 
Sbjct: 263 DEIJDLGSNXIENLPPLIFPQDLKELSQIiNLSYNPIQKIQANQFDYLVKLKSLSLEGIEISN 322 

Query: 358 IQQRMFRPLMNLSHIYFKKFQYCGYAPHTOSCKPNTDGISSLENL^ 417 

IQQRMFRPIJ^NLSHIYFKKFQYCGYAPHWSCKPOTDG 
Sbjct : 32 3 IQQRMFRPLMNLSHXYFKKFQYCGYAPHWSCKPNTDGISSLENLLASIIQRVFVWVVSA 3 82 

Query: 418 VTCFGNIFVICMRPYIRSENKLYAMSIISLCCADCLMGIYLFVIGGFDLKFRGEYNKHAQ 477 

VTCFGNIFVICI^PYIRSENKLYAMSIISLCCADCIaMGIYLFVIGGFDLKFRGEYNKHAQ 
Sbjct: 383 VTCFGNIFVIC^^PYIRSE^^CLYAMSIISLCCADCLMGIYLFVIGGFDLKFRGEYNKHAQ 442 

Query: 478 LWMESTHCQLVGSLAILSTEVSVLLLTFLTLEKYICIVYPFRCVRPGKCRTITVLILIWI 537 

LWMESTHCQLVGSLAILSTEVSVLLLTFLTLEKYICIVYPFRCVRPGKCRTITVLILIWI 
Sbjct : 443 LWMESTHCQLVGSIAILSTEVSVLLLTFLTLEKYICIVYPFRCVRPGKCRTITVLILIWI 502 

Query: 536 TGFI VAFI PLSNKEFFKNYYGTNGVCFPLHSEDTES IGAQI YSVAIFLG INLAAFI 1 1 VF 597 

TGFIVAFIPLSNKEFFKNYTGTNGVCFPLHSEDTESIGAQIYSVAIFLGINLAAFIIIVF 
Sbjct: 503 TGFIVAFIPLSNKEFFKNYYGTNGVCFPLHSEDTESIGAQIYSVAIFLGINI^VAFIIIVF 562 

Query: 598 SYGSMFYSVHQSAITATEIRNQVKKEMILAiO^FFFIVFTDALCWIPIFVVKFLSLLQVEI 657 

SYGSMFYSVHQSAITATEIRNQVKKEMILAKRFFFIVFTDALCWIPIFVVKFLSLLQVEI 
Sbjct: 563 SYGSMFYSVHQSAITATEIRNQVKKEMILAKRFFFIVFTDALCWIPIFVVKFLSLLQVEI 622 

Query: 658 PGTITSWVVIFILPINSALNPILYTLTTRPFKEMIHRFWYNYRQRKSMDSKGQKTYAPSF 717 

PGTITSWWIFILPINSAI^PILYTLTTRPFKEMIHRFVra^YRQRKSMDSKGQKTYAPSF 
Sbjct: 623 PGTITSWVVIFILPINSAiiNPILYTLTTRPFKBMIHRFWYNYRQRKSMDSKGQKTYAPSF 682 

Query: 718 IWEMWPLQEMPPELMKPDLFTYPCEMSLISQSTRLNSYS 757 

IWEMWPLQEMPPELMKPDLFTYPCEMSLISQSTRLNSYS 
Sbjct: 683 IWVEMWPLQEMPPELMKPDLFTYPCEMSLISQSTRLNSYS 722 



NC 

Sbjct: 61 NC 
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FIG. 6 

Signal peptide 

LGR4 MPG P LGLLC FLALGLLG3 AGPSGA 
LGR 5 MDTSRLGVLLSLPVLLQLATG 
LHR MKQRFSALQLLKLLLLLQPPLPRA 
FSHR MALLLVSLLAFLSLGS G 
TSHR MRPADLLQLVLLLDLPRDLGG 

N- flank cysteine -rich sequence 

LGR4 APPL AA-P S DGDR RVD SGKGLTAVPEGLSAFTQA 

LGR 5 GSSPRSGVLLRG P-TK H EPDGRMLLRVD SDLGLSELPSNLS VFTS V 

LHR LREAL P-EP N VPDG--ALR-- PGPTAGLTR 

FSHR HHRI H SNRVFL QESKVTEIPSDLPRNAIE 

TSHR MG SSPP E HQEED- - FRVT KDIQRIPSLPPSTQT 

Leucine-rich repeats 

* ► « ► « 

LGR4 DISMNNITQLPED KSFPFLEELQLAGN SL HPKALSG KE KVLTLQ - - Q 

LGR5 DLSMNNISQLLPNPLPSLHFLEELRLAGNA- - TY PKGA TG YS KVLMLQ -- Q 

LHR SLAYLPVKVIPSQ RGLNEVIKIEISQI S- ER EANA DN LN SEILIQ TK - 

FSKR RFVLTKLRVIQKG SGFGDLEKIEISQN V- EV EADV SN PK HEIRIEKAN - 

TSHR KLIETHLRTIPSH SNLPNISRI YVS I - VT QQLESHS YN SKVTHIEIR TR - 

► * ► M 

LGR4 RTV- SE IHG SA QS RLDA H- TSV EDS - - FEGLVQLRH WLD S-L- EV VR 

LGR5 RHV- TE LQN RS QS RLDA H- SYV P - SC- FSGLHSLRH WLD A-L- E VQ 

LHR RYIE -G FIN PG KY SIC- TG RKF DVTKVFSSESNFI - EIC LHI - T GN 

FSHR LYIN - E FQN PN QY LIS- TG KHL DVHK- IHSLQKVL - DIQ INIH - ERN 

TSHR TYID -D LKE PL KF GIF- TGLKMF DLTK - VYSTDI FFI EIT PYM- S VN 

► < ► M 

LGR4 PLSN P-TLQA T AL NISSIPDF T LSS W H HN K-IKSLSQHC D LDN-LE 
LGR 5 A RS S-ALQAMT AL KIHHIPDY G LSS WW H HN R- IHSLGKKC D LHS - LE 
LHR A QGMNNESVT K YG GFEEVQSH - GTT TS E KE VHLEKMHNGA R A-TGPK 

FSHR S VG SFESVI W NK GIQEIHNC - GTQ DE N SD NNLEELPNDV H A-SGPV 
TSHR A QG CNETLT K YN GFTSVQGY - GTK DAVY NK KYLTVIDKDA G VY'SGPS 

► ^ ► <« ► <4 

LGR4 T LNYNYLDEF Q- AIKA PS KELGFHSNSISVI D-GA GGNPL RTIH - DNPLS 
LGR 5 T LNYNNLDEF T-AIRT SN KELGFHSNNIRSI E-KA VGNPS ITIHF- DNPIQ 

LHR T ISSTKLQAL SYGLESIQR 1 -ATS-SYSLKKL SRET V-N- - LEAT T 

FSHR I ISRTRIHSL SYGLEN KK R - ARSTYN- LKKL TLEKLVA MEAS T 

TSHR L VSQTSVTAL SKGLEH KE I - ARNTWT - LKKL LSLS LH TRAD S 

► M ► ^ 

LGR4 FVGNS AFHNLSDLHCLVIRGASLVQWFPNLTGTVHLESLTLTGTKISS I PDDLCQNQKML 
LGR5 FVGRS AFQHLPELRTLTLNGASQITEFPDLTGTANLESLTLTGAQISS LPQTVCNQLPNL 
LHR 
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M ► M 

I 3R4 R T LDL S YNN I RDL ? S FNG CRAL E E I S LQRNQ I S L I KENT FQ G LT S LR I LD L S RNL I RE I K 
L 3R5 Q VL D L S YNL L ED L ? 5 F S V C Q K LQ K I D LR HNE I YE I KVDT FQQLLSLRS IjK LAWN K I A 1 1 H 
L.iR 

fshr 

tshr 

► 

LGR4 SGAFAKLGTITNLDVSFNELTSFPTEGLNGLNQLK 

LGR5 PNAFSTLPSLIKLDLSSNLLSSFPITGLHGLTHLK 

LHR 

FSHR 

TSHR 

C-flank cysteine-rich sequence 

LGR4 LVGNFKLKDALAARDFANLRSLSV YAYQ WGCDSLCKLNTEDNSPQEHSVTKEKGA 

LGR5 LTGNHALQSLISSENFPELKVIEM YAYQ GVCENAYKI SNQWNKGDNS SMDDLHKK 

LHR . cu RNLPTKE n NFSHS ISENFSK^CESTVR 

FSHR --SH ANWRRQI SELHPICNKS ILRQEVDYMT 

TSHR -SH KNQKKIRGILESLMCNESSMQSLRQRK 

LGR4 TDAANVTSTAENE HS 

LGR5 DAGMFQAQDERDL DF ■ 

LKR KVSNKTLYSSMLA SE 

FSHR QTRGQRSSLAEDN SS 

TSHR S VNALNS PLHQEY ENLGDS IVGYKEKSKFQDTHNNAHYYVFFEEQEDEI IGFGQELKNP 

LGR4 QIIIH T STGA K YLLGSWMI 

LGR5 LLDFEEDLKALHSVQ S SPGP K HLLDGWLI 

LHR LSGWDYEYGFCLPKTPR- A EPDA N DIMGYDFL 

FSHR YSRGFDMTYTEFDYDLCNEWDVT S KPDA N DIMGYNIL 

TSHR QEETLQAFDSHYDYTICGDSEDMV T KSDE N DIMGYKFL 

Transmembrane 

TM 1 TM 2 



LGR4 LTV F FLV LLF LL ILTVFA CSS PAS KLF I GL I S VSNLLM IYTGILTFL AVSW 

LGR5 IGV T AV LTC AL TSTVFR PLYISPIKL IGVIAAVNMLT VSSAVL G AF F 

LHR VLI L NI IMG MT LFVLLT RYK TVPRF MCNLS FADFCM LYLLLI S SQ K 

FSHR VLI F SI ITG II LVILTT QYK TVPRF MCNLAFADLCI IYLLLI S IH K 

TSHR IW FVSL LLG VF LLILLT HYK NVPRF MCNLAFADFCM MYLLLI S LY H 

TM 3 



LGR4 GRFAEFG WE S KV SLA S SA FL LAAV SVFAKDLMKHGKSSH QF 

LGR5 GSFARHGAW EN V HVI LSI S FL LAA GFSVKYSAKFET APFSSL 

LHR GQYYNHA D Q S ST FT L YT VIT WHTITYAIHLDQ LR HA 

FSHR SQYHNYA D Q A DA FT L YT AIT WHT I THAMQLDC VQ HA 

TSHR 5EYYNHA D Q P NT FT L YT VIT WYAI TFAMRLDR IR HA 
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TM 4 



L G R 4 ~> V AAL LA-L LG AA V AG IF r HCG S A S F L 

LGRr FVIILLGALLALTK AY I. G K GAS PI. 

LHR ILIMLGGWLFSSLI XL Y V N MKVG I 

FSHR A S VMVMG W I FA F AA LF IF I S MKVS I 

TSHR Z A I ; -TV G G WV C C F L L LI. V I S AKVS I 



TM 5 



FPTGETPGLGFTYTLYL GL LLMA 

LPFGEPSTMG MYALIL SLC LMMT 

MDVETTLSQV ILTILI \*V FIIC 

MDIDSPL5QL VriSLLV YL WIC 

MDTETPLALA IVFYLT IV VI VC 



6 



LGR4 


II T 


L 


CKL- EKEDLSENSQSSVI 


HV 


W 


NCIFFC 


VA 


FSFAPLITAIS 


SPEI 


LGRS 


IA T 


L 


CNL-DKGDLENIW GSrW 


HI 


L L 


NCILNC 


VA 


LSF SLINLTF 


SPEV 


LHR 


AC I 


I 


FAVRNPELMATNK TKIA 


KK 


I 


DFTCMA 


IS 


FA I AAFKVFL 


TVTN 


FSHR 


GC IH 


I 


LTVRNPNIVSSSS TRIA 


RM 


M 


DFLCMA 


IS 


FAI ASLKVPL 


TVSK 


TSHR 


CCHV 




ITVRNPQYNPGDK TKIA 


RM 


V 


DFICMA 


IS 


YAL AILNKPL 


TVSN 



TM 7 



LGR4 H SVTLI F 

LGR5 I FI LVW 

LHR S VL VL Y 

FSHR A IL VL K 

TSHR S IL VL Y 



LPA 


L 


V 


VF 


N 


LPA 


L 




IL 


N 


INS 


A 


p 


AI 


T 


INS 


A 


F 


A I 


T" 1 


LNS 


A 




AI 





C-terminal tail 



LGR4 


PK 


KE 


WKL 


KRRVTRKHGSVSVSISSQGGCGEQDFYYDCGMYSHLQGNLT/CDCCESFL 


LGR5 


PH 


KE 


LVS 


RKQTYVWTRSKHPSLMS INSDDVEKQSCDSTQALVTFTSSSITYDLPPSS 


LHR 


KT 


QR 


FFL 


LSKFGCCKJ^RAELY r RRKDFSAYTSNCKNGFTGSNKPSOSTLKLSTLHCQG 


FSHR 


KN 


RR 


FFI 


LSKCGCYEMQAQIYRTETSSTVHNTHPRNGHCSSAPRVTNGSTYILVPLS 


TSHR 


YJk 


OR 


VFI 


LSKFGICKRQAQAYRGORVPPKNSTDIQVQr^THDMROGLr^EDWELI 



LGR4 LTKPVSCKHLIKSHSCPVLTAASCQRPEAYWSDCGTQSAHSDYADEEDSFV T SDSSDQVQA 

LGRE VPSPAYPVTESCHLSSVAFVPCL 

LHR TALLDKTRYTEC 

FSHR KLAQN 

TSHR ENSHLTPKKQGQISEEYMQTVL 

LGR4 CGRACFYQSRGFPLVRYAYNLQRVRD 
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c n Repeat Regions 



<1J0> SUN- 8 4 PCT 
< 1 6 0 > 8 

<170;- FastSEQ for Windows Version 

<210> 1 

•••22 1: 2856 

• 212 • DNA 

<213> human 



< -1 0 0 
atgccggqcc 
cccagcggcg 
gtggactgct 
gcact ggata 
ccattt ctag 
ttgtctgggc 
cccagtgaag 
att acct rag 
gatgacaaca 
gcgctgacct 
t caagcttgg 
ttt gatggac 
cct caggot a 
tctgttat t c 
gataatcrtc 
ttagtca t tc 
ttggagagt c 
aaccaaaaga 
t ttaatggtt 
aaggaaaat a 
at ccgt gaaa 
agtttcaatg 
cttgtgggta 
tctctat cag 
aaattaaaca 
acagat gcag 
cactgt acac 
attcgcctt a 
ttaacagtgt 
tctgtgtcta 
t cctgggg cc 
gccgggtctc 
gtggaaagaa 
cagttccagg 
ct tttccacg 



cgc tagggct 
cggrgccgcc 
ccggaaaggg 
tcagtatgaa 
aggagct aca 
t gaaagaact 
c cattcacgg 
t cccggagga 
gcttgacgga 
tggctctcaa 
tggttctgca 
tagataacct 
t taaagccct 
ctgatggagc 
tgtcttttgt 
ct ggtgcaag 
t aaccttgac 
t get gaggac 
gt cgt gcat t 
ctt tt caagg 
t tcacagtgg 
aattaacttc 
acttcaagct 
taccatat g: 
cagaagat aa 
caaatgtcac 
ctt caacagg 
cagtgtgqtt 
ttgegtet t g 
a :ttactcat 
gat tt geega 
t ggcagtctt 
g rgtatttgc 
t ggccgccct 
gagggcaa ta 



gctctgcttc 
t ct ctgcgcg 
gttgaeggee 
caatatcacc 
actggctggt 
caaagtccta 
actgagtget 
cagttttgaa 
agtgcccgtg 
caacatctca 
tetgeataac 
ggaaaccttg 
t cccagcctt 
atttggtggt 
ggggaactca 
cctggtgcag 
agggacaaaa 
tctggactta 
ggaagaaatt 
cctaacatct 
agct tttgcg 
a 1 1 1 cctacg 
gaaagacgee 
1 1 atcagtgt 
cagcccccaa 
cagcactgct 
tgctttcaag 
cattttcct g 
ttcatcactg 
gggcatctat 
atttggcatt 
ct cct cagag 
aaagqatttg 
cttagctttg 
tt ctgeateg 



ct cgccctgg 
gcgccctgea 
gtaceggagg 
cagttaccag 
aacgacct t t 
acact ccaga 
ttgeagtett 
gggct tgtcc 
ogt cccct ca 
a.gcatccctg 
aataaaatta 
gacttgaatt 
aaagagctgg 
aat ccactgc 
gcatttcaca 
tggttcccca 
at aagcagca 
t ctt at aaca 
t cat t gcagc 
ctaaggattc 
aagcttggga 
gaaggcctaa 
t t ggcageca 
t gtgcatttt 
gaacacagt g 
gagaacgaag 
ccctgtgaat 
gt cgect tgc 
cctgcct cca 
actggcat cc 
tggt gggaaa 
agegctgtat 
atgaaacacg 
ct gggtgccg 
cccttgtgt t 



ggctgetegg 
getgegaegg 
gt ct cagege 
aagatgeatt 
ct cttatcca 
ataatcagt t 
tacgettaga 
agttacgcca 
gcaacctgcc 
acttegcttt 
aaagect cag 
acaat tactt 
gattt cacag 
taaqaact at 
acct gtct ga 
at ct gaeegg 
tacctgatga 
at ataagaga 
gtaat caaat 
tagat ctgag 
caattactaa 
atgggctcaa 
gagacttt gc 
gggggtgtga 
tgacaaaaga 
aacatageca 
atttactggg 
t ttt caacct 
aact ctt cat 
ttacttttct 
ctggcagcgg 
t cttattaac 
ggaagagcag 
cagtggcagg 
tgccgttt cc 



c t eggceggg 
cgaccgtcgg 
ctt cacccaa 
taagagtttc 
t ccaaaagcc 
gagaacagt g 
tgccaaccat 
tctgtggctg 
aaccctgcag 
caccaacct t 
t caacact qt 
ggatgagttt 
taattctatt 
tcatttgtat 
tetgeattge 
aactgtccat 
t ctgtgccaa 
cct tccaagt 
ct ccctaata 
tagaaacctg 
cct ggatgta 
t caactaaag 
t aatctcagg 
etctttatge 
gaaaggtgct 
aataattat c 
aagctggatg 
gctt gtcat t 
aggcttgatt 
t gatgetgt g 
ctgcaaggta 
actggcagct 



t cacctcaga 
ctgett cccc 
tacaggagaa 



<:'.0 
110 
IbO 
240 
300 
360 
420 
4 80 
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a 1 1 a c a c c g c 


ttatattea a 


1 _ ^ -J 


ggacctgaag 


gagctgt cac 


aat agaatct 


tt catataat 


acaat c eaga 


aaatt caag a 


1 2 C' 1 0 


aaaccaattt 


gattatcttg 


tea a a ate a a 


gt c t ct cage 


aeaga aggga 


ttgaaattt a 


i '~ ^ ] 


aaatatccaa 


caaaggatgt 


1 1 a g a a c t at 


t a t gaa t ct c 


tcteacatat 


attttaagaa 


1 c 


a 1 1 c ca gt a c 


t gtgggtatg 


c ae aaca t qt 


t egcaget gt 


aaaacaaaca 


c t g a t g g a a z 


1 a o 


rtcatctcta 


ga gaatctct 


t gg aaagcat 


tattcagaga 


gtatttgtct 


gggtagtat a 


i * * i~> 
Itil 


tqcaqtiacc 


t g at tt ggaa 


aaatttttgt 


aatttgcatg 


agac ctt at a 


teaggtctga 


1 r" 0 C 


" j aac .a a q c t g 


tatgecaegt 


c a a c c a c 1 1 c 


tat at get gt 


g c c q a c t g c t 


taatgggaat 


1 5 o 


: ;:.a:::a:tc 


gt gateggag 


g:t:tgac:t 


aaagtttcgt 


g g a g a a t a c a 


ataag aat gc 


1 "o 1 C 


oca- at aagg 


at gaagagt a 


ctcattgt aa 


gcttgtagga 


tetttggeca 


1 1 ct gt ccac 


1 6? 0 


daaaorarc^ 


g — _ t a c _ g t 


d a c a ^ ^ t c t 


gacattggaa 


aaat aca t ct 


gcattgtcta 


i m 


*~ c :t.t: z a g a 


t gt gtgagac 


ct ggaaaa t g 


aagaacaatt 


acagt t ctga 


1 1 c t c a 1 1 1 g 


1 8 0 C 


a a t z ac.ggt 


1 1 1 at agt gg 




attgagcaat 


aaggaatttt 


tcaaaaacta 


1 8 6 C 


■■ ■ ~ c ■:::a:c 


a a t ggagtat 


gcttccct at 


a c a 1 1 c a g a a 


gatacagaaa 


gtattggagc 


1 9 2 0 


. ■ :.i:a:r:a: 






tattaatttg 


geegcattta 


tcatcatagt 


l'-t0 


t * t* a -ci at 


g gaagcatgt 


attatagt at 


teat aaaagt 


gecat aacag 


caactgaaat 


2 04 0 


i:i:aa::;aa 


gt taaaaaag 


a gat gate rt 


t g acaaacgt 


tttttcttta 


tagtatttac 


2100 


z a a z a a a a z a 


t g at ggata a 


caatttttgt 


agtgaaattt 


attt cactgc 


t tcaggt aga 


2160 


a a a :ca g g a 


a a aataacct 


ct t gggt agt 




ctgcccatt a 


acagtgcttt 


2 2 1" 0 


q a. ci a a a a a t t 


ctct atacte 


t gaccacaag 


a ccatt t aaa 


gaaatgattc 


at cggttttg 


cseo 


a t a r .a .-izzac 


agacaaagaa 


aat ct aegga 


cagcaaaggt 


cagaaaacat 


at get ccat c 


2340 


at::..:c:c?c 


gt ggaaat gt 


g g a. c a c t g c a 


ggagatgaca 


cc tqagttaa 


tgaagccgga 


2 4 00 


c c a t a z a a c a 


taaccctgtg 


aaat gt cact 


gatttctcaa 


t caacgagac 


t caattcct a 


2 4 6 0 


- r. a a aoa 
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< 4 00> 
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Mat Thr Ser 


21y Ser Val Phe Phe 


Tyr lie Leu 


lie Phe Glv 


Lys Tyr 




: 


5 




10 




1 5 




Pee Ser His 


Gly Gly Gly Gin Asp 


Val Lys Cys 


Ser Leu Gly 


Tyr Phe 






20 




25 


30 






Pro Cys Gly 
-5 


Asn lie Thr lys Cys 

4 0 


Leu Pro Gin 


Leu Leu His 
4 5 


Cys Asn 




c-ly Vai Asp 


Asp Cys Gly Asn G*n 


Ala Asp Glu 


Asp Asn Cys 


Gly Asp 




o 0 




5 5 




60 






Asn Asn Gly 


Trp Ser Met aln Phe 


Asp Lys Tyr 


Phe Ala Ser 


Tyr Tyr 




65 


70 


75 




80 




Lys Met Thr 


Ser Gin T\ 


r r Pro Phe 


Glu Ala Glu 


Thr Pro Glu 


Cys Leu 






85 




90 




95 




Val Gly Ser 


Val Pro Val Gin Cys 


Leu Cys Gin 


Gly Leu Glu 


Leu Asp 






100 




105 


110 




Cys Asp Glu 


Thr Asn Leu Arg Ala 


Val Pro Ser 


Vai Ser Ser 


Asn Val 




115 




120 




125 






Thr Ala Met 


Ser Leu Gin Trp Asn 


Leu lie Arg 


Lys Leu Pre 


Pro Asp 





130 135 140 
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Thr joi H 7ei li'.r 'I'.': AH Aru -J J 



7H Pt.e G_u Aso u iL : .3 : ■ : Lea . ; : u ::r He, lie . e els .• . s : A./;, 

i : .• ..• 1 - a Ser _Ar :: i .- e : • ; ." i r .■ Thr he Tyi Gl y A; :. 7- r h • : 

:. i: lh ...... 

lie lej Leu Vol I.e.; !■'/:■:. Asri Asu 7H Lou Thr Arc Leu 1 ro Asp Lys 

22 5 .. 2 7 225 L 4 0 

Pro Leu Cy.s Gin H . : ■: Met ir^ A: :i Leu His Trp Leu Arp Leu Giu Giy 

2 4:> 257 ' 2 55 

Asn His 1 H His Asn Leu Arc: Asn 2eu Tnr Phe lie . Gys Ser A sr. 

2 67 2 65 7H 

Leu Tr.r V a .1 Leu VjI Me *, A:: Lye Asn 7ys lie Asn :us Leu .A sr. OH 

27 5 7H 2 25 

Asn Thr One Aid fur. Leu 7ir. Lys Leu Asp Giu Leu Asp Leu Gi y 2 er 

2 90 2 75 307 

Asn Lys He Giu Asn Leu Pro Lr : ieu iie Phe Lys Asp Leu Lys Liu 

3 0 5 ":■ i 3 3 5 5 2 : 
Leu Ser Gin Leu Asn Leu 2er 1 yr Asn Pro He Gin Lys iie 371 n AH 

323 330 3 35 

Asn Gin Lhe Asp Tyr Leu Va.1 Lys Leu Lys Ser Leu 5-er Leu Giu Giy 

34 2 34 5 3 50 

lie Giu lie Ser Asn lie GH Gin Arg Met Fine Arq Fro Leu Met Asn 

355 560 365 

Leu Ser His lie 7yr Phe Lys Lys lhe Gin Tyr Gys Giy Tyr A_a Pro 

370 375 3S0 

His VTol Arq Ser Gys Lys fro Asn Thr Asp G 1 y He Ser Ser Leu Giu 

36 5 3 9 0 3 95 4 02 

Asn Leu Leu Ala ser He- iie ILiu Aru 7H Pne Vol Try Vai Vs i Ser 

4 0 5 4 1 0 4 1 5 

Ala Val Thr Gys P'ne Giy Asn Ho Phe V7:l He Gys Lie t Aru P r a T yr 

■4 2 0 4 2 5 4 3 0 

lie Aru Ser Giu Asn Lys Leu Tyr Ala Met Ser He Ho Ser Leu Gys 

4 3 5 4 4 3 4 4 6 

'ys AH Asp' Gys Leu Mot Gios 1 1. e Tyr L-u Phe 7a 1 iie Giy Siy : n- 

4 1 e 4 : 5 4 6 u 

Asp Leu Lys Phe Au: a Giy Giu i yr Asn loss His Ara G i n Leu Tr r: Mot 

4 65 4 70 4 75 4 HO 
Giu Set Thr His Hot Gin Leu 7H Giv Ser Leu Ala lie Leu Ser Thr 



42 : 



1 a r 



Giu Vol Ser VH LeU Leu Leu Thr one Lou Tnr Leu Oliu Lys Tyr Ho 

507 505 51 ; 

Gys lie 7a 1 Tyr Pro Phe Are: Gys '3d. Ara Pro Giy Lys 7ys Ara Thr 

lie Tnr Vol Leu He Leu lie Trp He Tnr Giy Phe He Va 1 Aid Pne 



54' 



lie ?r: Leu Ser Asn lys Ho Pne Pne Lys Asn Tyr Tyr Giy Thr Asn 

54 5 5 57 555 5b0 

Giy Vol o\'s Pne Pro Leu His G e r 27 u Asp Tnr Giu Ser iie Giy A.. a 

5 c 5 : 7 . 5 7 : 

Gin He Tyr Ser Tai AH He Pne Leu Gj y He Asn Ho Ala Ala Phe 

5 9 -0' - - - : . 

lie lie He V .a 1 Pne Ser Tor 71 y Ser M t F h <? Tyr G e r Val His Gin 

5 95 HO 



S 



WO 99/48921 



PCT/US99/06573 



610 '-- - 5 U 2 '-' 

02 5 u30 63 5 e4 0 

Trp II*.- Pr'- Ire Phe V u .i V.:u.l Lys Phe Leu Ser Leu Leu Gir. Va 1 Giu 

>n 4 5 6 5 0 6 5 5 

"Lie Pr- Giv Thr He Tor Ser Trp Val Va. He Phe lie Leu Pro He 

6 6 0 6 6 5 6 ' 7 C 

Asn Ser Ai ^ Leu Asn Pro lie Leu Tyr Thr Leu Thr Thr Arg Pro Phe 

r:7 r - 68 0 68 5 

Lys GIu Me:. lie His Arg Phe Trp Tyr Asn Tyr Arg Gin Arg Lys Ser 

690 695 700 

Met Asp Ser Lys Gly Gin Lys Tnr Tyr Ala Pro Ser Phe lie Trp Val 

705 710 715 720 

Glu Me*. Trp Pre Leu Gin Giu Me: Pre Pro GIu Leu Met Lys Pro Asp 

725 7 30 7 3 5 

Leu Phe Thr Tyr Pro Cyu Giu Meo Ser Leu He Ser Gin Ser Thr Arg 

-7 1 r> "7 1 c '7 r; r, 

Leu Asn 5er Tyr Ser 
i c, =. 



210> 7 

21H 3584 

'212:- DNA 

:2 13. ■ human 



< 400> 7 

crgctttgta actgetaaga ttgcagacag aaatagcaca caaocactgo gagctgtatg 60 

cgattcagaa accaagacca aartttgetc acttt:atta ateagttget cagatagaag 120 

gaaatgacat ctggttctqt cttcttcrac atcttaattt ttggaaaata tttttctca: 180 

gggggtggac aggatgtcaa gtgctccctt ggctarttcc cctgtgggaa catcacaaag 240 

tgcttgcc:c agctcctgca ctgtaacggt gtggacgact gcgggaat :a ggecgatgag 5 0 0 

gacaactgtg tggtggtttt gtgccagtgc atgtctttgc caggtctgga gcttgactgg 360 

argaaaccat ttacgagtgt tecateggtt tcttcaaatg tga:tgcaa: gtcacttcag 420 

tggaacttaa taagaaagct tcctcctgat tgcttcaaga attatcatga tcttcagaag 480 

ctggacctgc aaaacaataa gattacatcc atctccatc: atgctttcag aggactgaa: 540 

agecttacta aactgtatct cagtcataac agaataacct tccigaagcc gggtgtttt: 600 

aaaqatc::c acagactaga atggctgata attgaagata atcacctcag tcgaatttcc *>b0 

c:accaacat tttatggact aaattct:tt attctcttag tc:tgatgaa taaegtccto "LO 

acccgtttac ctgataaacc octctgtcaa cacatgccaa gactacattg gctggacctt ~80 

gaaggcaacc atatccataa tttaagaaat :tgactt:ta tttcctgcag taatttaact 840 

gttttagtga tgaggaaaaa oaaaattaat cacttaaatg aaaatacttt tgcacctctc 900 

cagaaactgg atgaattgga tttaggaagt aataagattg aaaatcttc: acegcttata 960 

ttcaaggaoc tgaaggagrt gtcacaattg aatctttcct ataatccaat ccagaaaatt 1020 

caagcaaa:c aarttgatta tcttgtcaaa ctcaagtctc teagectaga agggattgaa lObO 

atttcaaata tccaacaaag gatgtttaga cctcttatga atctctctca catatatttt 1140 

aagaaar::: agtactgrgg g oat g caeca catgttcgca gctgtaaacc aaacactgat 1200 

ggaatttcat ctctagagaa tctcttggca agcattattc agagagtatt tgtctgggtt 1260 

gtatctgcag ttacctgett tggaaacatt tttgtcattt qcatgcgacc rtaoatcagg 1320 

tctgaqaaca agctgtatgc oatgtcaatc atttctctct gctgtgccga ctgcttaatg 1380 

ggaatatatt tattegtgat eggaggcttt gacctaaagt ttcgtggaga atacaataag 1440 

catgcgcagc tgtggatgga gagtactcat tgtcagcttg taggatcttt ggccattctg 1500 

tccacagaag tatcagtttt actgttaaca tttctgacat tggaaaaata catctgeatt 1560 

gtctatcctt ttagatgtgt gagacctgga aaatgcagaa caattacagt tctgattctc 1620 

atttggatta ctggttttat agtggctrtc attccattga gcaataagga atttttcaaa 1680 

aactactatg gcaccaatgg agtatgette cctcttcatt cagaagatac agaaagtatt 1740 

ggageccaga tttattcagt ggcaattttt ettggtatta atttggcege atttatcatc 1800 

atagtttttt cctatggaag catgttttat agtgttcatc aaagtgccat aacagcaact 1860 

gaaatacgga atcaagttaa aaaagagatg atccttgcca aacgtttttt ctttatagta 1920 
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t; t ^ ™ t: ja: .: 
■ q v. a a <a a : 1 t a c 
t:a a ace 
:r.r.tgg c a t a 
;:atc3:.:ca 

recta:': ca t 

ataaa:a::t 
ataa::^:at 
tctac:t:ag 
gtgeatagea 
tagtgtttaa 
::ttac:tg: 
ttggaat a cc 
cgttttaaga 
atat:tat:a 
:cctac:ca q 
gtatqaar a c 
gtqqaaact z 
aagacaaaaa 
tcacccactt 
aggaaaaaga 
tgtacacatt 
gcactgccag 
tattagr.att 
qatagtttca 
geccttattg 



:a::. ; : a a : a 

C J.; 2. G I J; 2 2 a * 

caait /: :\ ■. 
a ct a aaaa a a 
t; at j:q: ::u 
t:a::ta:: a 
aa a i 2': ; 
:.acrg:t:a: 
:a:aa:^aa; 
c a a t a a t a t a 
taa :atrcar 
ttgcaataia 
tgacaaaaga 
catgtgcar: 
cag:g:a::: 
atc:a:ctta 
ctaaaaaaa: 
ttt: ::ctt:: 
ggcagagaa a 
ctoaaataa a 
ggcrtttcaa 
taga-gggtg 
gctggaarg: 
tca::ttaag 
c a c gtgacat 
ttttgett t a 
aat aenccaa 
aatatatagt 



a a j i j a a a i a r 
•. i ' 'it a a a a. : 
a *: at gaaa t a 
": a . i a r \ ca:: :. 
j a a a a a aa:a 
*.. a a 3 a a a a a a 
:a:at.a : * ... a 
:catt:::c: 
gt cct ggaag 
:r.atacaaag 
aocaagaa: a 
acarcataqa 
"tctcttta 
act t t gctag 
q:ttraaarq 
t g g a t a a a a a 
:cct.:qtatt 
caaaccqtqc 
aatgttatgg 
actgattcag 
cagaaaatct 
actaactttt 
ataatgtgaa 
aaatgtttgc 
tgtatagntt 



t 1. r at aqr. ca 
ac^:cat:: 
a a a a a a c c a t 
a c aaa egg ca 
c a a c a a a a g a 
t:a:::a::t 

c:acaaaa:: 
t a a a a a t g a a 
t a a a c a t 1 1 z 
aa :a:gcatt 
taga-:aqtg: 
tccat ctgea 
at aggcactc 
a a a z g t c t g a 
agc::cca:a 
a a act caaaa 
a a cat catca 
aaactaqaat 
gt cagt taac 
t gtttt aaga 
t gtgtgaaat 
gaact taat t 
ttctt caaga 
taactagttg 
ac cttt aagc 
aacaoaaaaa 
t gtt atgaaa 



a a t r r c: : t a 
• : . .: f ~ 2 1 g c a 
a a a a a a a a a a 
a a g c r caaaa 
tgccacetaa 
c z c a a t c a a a 
a a a t a at at a 
■a a 1 1 1 a t a a : 
a cat a at cct 
q a a t a a q a a a 
tattgaatac 
a g a a c a 1 1 1 c 
gt tcecaata 
1 1 a a a t a a a g 
ctgt ttgcaa 
ca cttgagag 
a cage a :t t c 
t ata aaattg 
g a ggat aaac 
t gattttcaa 
acagacctaa 
at :tcagtaa 
tcaggaagga 
aat gacttta 
tt ctt ctcta 
agga aaagaa 
t a c t g g a a t c 
accc 



a : V ' 



1 1 a a a 
a cagt 
■a a 1 1 c a t c a g 
a ^ c:it at get 
c r :. a a t aaa a 
a a a a c t taat 
7 7 a a t g c 1 1 c 
a a t a ac t aag 
■a caaag agaa 
1 1 a a aaga a a 
scactactat 
aatct gtaga 
Laaagtagag 
qt 1 1 aaagt t 
aat aat at t c 
z c a a c a c a a c 
1 1 1 1 ggcact 
qaataqqaqa 
at t t a ca tt a 
eaaggatgee 
gt ggtttaat 
agcagt taaa 
aaqqtctgta 
cttt ctcttt 
gt :tc:acqt 
aatqttttca 
naaccaraat 
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Gly 
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Tyr 
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Asp 
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35 

Va. Asp 
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Leu Cys 
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Tyr His 

115 

lie Ser 
1 3 a 

Lea Ser 
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He Ser 



3 
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Gly 
2 0 
As n 
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Th r 

Trp 

10 0 
Asp 

1 1 e 
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Arc 

Pr a 

16 0 
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Ser 
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P. sr. 

Leu 
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Leu 
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Val 

Gly 
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7 0 
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Leu 
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Ala 
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: 5 0 

21 u 
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Phe 
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A a n 
Ser 
Pro 



Phe 

" n. 

a 

lie 
Trp 
Pne 



Phe Tyr he 
10 

Asp Val Lys 
<~ a 

Cys Leu Pa: 

4 ■"' 

Gin Ala A a p 

Leu Pro Gly 

Ser Val Ser 

0 0 

Asrg Lys Lea 
105 

Lea Asp Lea 

Arc Gly Lea 

Thr Phe Leu 

L ea Tie lit 

Tyr Gly Lea 
IBS 



Lea 

C;/3 

Gin 

G L a 

Lea 
7 5 
Se r 

Pr d 

Gin 

Aan 

Lys 
155 



He Pne Gly 



Leu 
Asr, 



Asa 

P r a 

As n 

Ser 
140 
Pr a 



L e a G 1 y 

3 0 

Leu Has 
A 5 

Aan Cys 



Gla Leu Asp 



-/s eyr 
Tyr Phe 
2 22 Asn 



Trp Met 

3 0 



rtSw ber 



Asp Cys 
110 

Asn Lys 
125 

Leu Thr 

Sly Val 

Asn His 

Leu lie 
190 



1 1- 



Met 
Lys 
Thr 



[ : . e G 1 a 

1 60 
L-u Ser 

1 7 a 

L^a Leu 
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Hi 3 Asri Leu A:;: Ann ^e>; Thr Phe lie Ser Cyc ;Hr H:n Leu Thr Val 
2 2; HO 3 :. 240 

Leu Va. Met Aru Lys A sr. Lys lie Asn His Leu Ash G 1 u Asn Thr phe 

-■ ,- r_ - n r net 

- " ■ - - J 

A_a Pre Leu CHu L y s Leu Asp 31 u Leu Asp Leu Gi y ;-er Asn Lys lie 

GLu Asn Leu Pro Pre Leu lie Phe Lys Asp Leu Lys Giu Leu Ser Gin 

2 7 5 2 8 0 2 8 5 

Leu Asn Leu Ser Tyr Asn Pro He G_n Lys lie Gin Ala Asn Gin Phe 

2 90 2 95 300 

Asp Tyr Leu Val Lys Leu Lys Ser Leu Ser Leu Glu Gly He Glu He 
305 310 315 320 

Ser Asn He G_n Gin Arq Met Phe Arq Pro Leu Met Asn Leu Ser His 

3 2 5 3 3 0 3 3 5 

lie Tyr Phe Lys Lys Phe Gin Tyr Cys Gly Tyr Ala Pro His Val Arq 

340 345 350 

Ser Cys Lys Pre- Asn Thr Asp Gly lie Ser Ser Leu Glu Asn Leu Leu 

355 360 365 

Ala Ser He lie Gin Arg Val Phe Val Trp Val Val Ser Ala Val Thr 

370 375 380 

Cys Phe Gly Asn Lie Phe Val He Cys Met Arq Pr :> Tyr He Arq Ser 
385 3 90 3 95 " " 4 00 

Glu Asn Lys Leu Tyr Ala Met Ser He lie. Ser Leu Cvs Cys Ala Asp 

4 05 410 4 15 

Cys Leu Met Gly He Tyr Leu Phe Val He Gly Gly Phe Asp. Leu Lys 

420 425 430 

Phe Arg Gly Glu Tyr Asn Lys His A^a Gin Leu Trp Met G_u Ser Thr 

4 35 4 40 4 45 

His Cys Gin Leu Val Gly Ser Leu Ala He Leu Ser Thr Glu Val Ser 

4 5 C 4 5 5 4 6 0 

Val Leu Leu Leu Thr Phe Leu Thr Leu Glu Lys Tyr lie Cys lie Val 
465 470 475 480 

Tyr Pre- Phe Arq Cys Val Arg Pro Gly Lys Cys Arc Thr lie Thr Val 

4 5 4 PC' 4 95 

Leu lie Leu He Trp He Thr Gly Phe lie Val Ala Phe He Pre- Leu 

50 J 505 51C 

Ser Asn Lys Glu Phe Phe Lys Asn Tyr Tyr Gly Thr Asn Gly Val Cys 

5H 520 " ' 525 

Phe Pre Leu His Ser Giu Asp Thr Glu Ser lie Gly Aia Gin lie Tyr 

53C 535 540 

Ser Val Ala He Phe Leu Gly lie Asn Leu Ala Aia Phe He lie lie 
545 550 E55 560 

Val Phe Ser Tyr Gly Ser Met Phe Tyr Ser Val HH Gin Ser Ala lie 

5 6 5 5 7 0 5 7 5 

Thr Ala Thr Glu lie Arc Asn Gin Val Lys Lys Glu Met lie Leu Aia 

580 585 590 

Lys Arg Phe Phe Phe lie Val Phe Thr Asp Ala Leu Cys Trp lie Pro 

595 600 605 

lie Phe Val Val Lys Phe Leu Ser Leu Leu Gin Val Glu lie Ere Gly 

610 615 62 l 

Thr lie Thr Ser Trp Val Val lie Phe lie Leu Pre lie Asn Ser Aia 
625 630 635 640 

Leu Asn Pre lie Leu Tyr Thr Leu Thr Thr Arg Pre Phe Lys Glu Met 
645 650 655 
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